Progress is made on detecting a stylus using computer vision and convolutional neural networks.
This post records a minor success on my computer vision project training a LeNet-5 style convolutional neural net to recognize a stylus. The structure of a convolutional neural network is designed to automatically generate features from the image that are significant to the detection of categories.
The object of the game for my project is to be able to examine any given patch in an image and detect the presence of the stylus. And, if the stylus is detected, then determine the dimensions and location of the stylus. While categorization was not too difficult, the regression portion determining the location was/is problematic.
The neural net was created using PyNeurGen with modified classes for Convolutional and Subsampling layers. A StochasticNode class created for drop-out nodes simulates unreliable sensors. Basically, drop-outs are a method of regularization. At some point I will add those classes to the PyNeurGen project when the code settles down.
My initial approach used an output that signified category and location simultaneously, reasoning that the same inputs are going into both calculations. If the stylus was not detected, all the location coordinates would be 0. If detected, then draw the outline.
However, all too often, the tentative lines that appeared were erratic. The network was deciding for each and every point, whether (1) a stylus was present, and (2) the specific location of the point or output a zero. It makes more sense to detect the stylus presence, then having that belief, calculate the location.
Possibly by running additional epochs, the problems would sufficiently dissipate, but intuitively it seemed like a poor approach.
Eventually, it occurred to me that a better structure added an additional output layer. In the new format, only the two categorization nodes would sit in the former output layer. Then, when using that a priori determination of category, the category nodes would inform the location nodes without ambiguity.
In a sense, the network should be viewed as two networks, category and regression, with sparse links between them.
The new output layer holds two category nodes, and all of the location nodes. The two category nodes are fully connected to the the previous layer. The location nodes are fully connected to the previous layer as well. However, additional connections use a skip-layer approach connect to the previous layer as well. Therefore, the category nodes would be duplicated, but location nodes, all their connection weights would be preserved.
In the new structure, the location nodes use the previous layer's category nodes as a reasonable guide for calculating position. At the same time, the additional calculations are minimized, because of the reduced calculations on the old output layer.
The following diagram shows the current network layout.
Is this the best way?
This may not be the best. In one sense, I have been trying to honor the traditional structure of a neural network. One of the key concepts has been that on each layer, all the nodes are fed-forward from one layer to the next, from input to output. However, a short-cut could have been used. For example, on my original network structure the output layer which had the mix of categories and location nodes, there could be have been an if/then type switch segmenting feed-forward based on the output of nodes on the same layer. While one would lose vectorization advantages in feed-forwarding, the trade-offs might be sufficient to justify its use.
So, additional avenues of exploration are still possible.
Because ambiguity is reduced in my current structure, I will also be examining whether I can reduce the size of the generalized layer, which could save considerable time and memory.