Machine Understanding


More Tech Stuff:

Indexing Books: Lessons in Language Computations

Client-Side Frame Manipulation Inside the Microsoft Internet Explorer Object Model with Visual Basic .NET

Replacing a PC power supply

Constructing a Mandelbrot Set Based Logo with Visual Basic.NET and Fireworks

March 8, 2010

Evaluating HTMs, Part 5: More on Internal Operations of Nodes

See also Part 1, Part 2, Part 3, Part 4

"Hierarchical Temporary Memory, Concepts, Theory, and Terminology " by Hawkins and George, Section 4, How does each node discover and infer causes?, covers questions about the internal operations of nodes that were raised or partly covered in earlier sections.

Basically, each node simulataneously does both learning and recognition of spatial and temporal patterns. The output is information about the patterns that can be sent up and down the hierarchy of nodes.

Spatial patterns do not necessarilly mean space as in space-time continuum, although the example used in the essay is of a two dimensional visual space. Space is used in the mathematical sense. The data can be anything quantifiable. The space could be any number of dimensions. For instance the maximum daily surface temperature of the earth would consitute a space within a certain range of degrees Centigrade, to whatever desired level of precision, on a spherical 2D grid representing the surface of the earth to any desired degree of precision. The time sequence in this example would be daily samples. In a digital audio example the time sequence intervals might be something on the order of .0001 seconds.

The node has a significant number of "quantization points" available to categorize the spatial data. Only the most common data patterns, up to the number of quantization points, will be learned. Anything that is not one of the learned patterns will be assigned a probability that it is one of the learned patterns plus some noise.

Having leaned the quantization points, the node can start looking for common temporal sequences of them. Again, a limited number of points or memory units are allocated for learned temporal patterns. I can't find where the authors give them a name, so at the risk of being corrected later, I'll call them temporal pattern points. Again, temporal pattern matches don't need to be exact; some noise is tolerated.

Once learning has taken place (learning can continue), the node can work to infer causes. The patterns held in the quantization points, as well as the patterns of these in time held in the temporal pattern point, can be causes (or call them objects, which is the more typical if less precise vocabulary.). As time passes the data changes and the causes output to the higher level node(s) of the hierarchy change.

Another very important idea to wrap your head around is that the node, and the HTM, need to deal with probabilities. You might get an exact match with 100% probability, you might even be able to design special situations where you don't need to deal with probability. But the whole point of HTMs (from an engineering standpoint) is to be able to deal with complex real world data, in a manner similar to the human brain. So whether dealing with matching a spatial pattern to the quantization points, or a temporal pattern to the temporal pattern points, you need to think in terms of probability. There is a 42% change that it matches point 7, an 18% change it matches point 42, and a 40% chance that it matches none of the quantization points. With the temporal pattern it gets more complicated, since you can't assume that just because a spatial pattern is most likely point 7, it is not in a temporal pattern that goes, say 49, 3, 7 instead of 49, 3, 42.

Hey but that is what computers are handy for, figuring probabilities and keeping track of them.

The output to the higher-level nodes might be though of as: There is a 12% probability that we are seeing temporal pattern point 7, 35% it is point 16, 48% it is point 34, and 5% it is point 49. It seems to us to be messy, but it is exactly the sort of thing the next level up is looking for. We call it a vector output, but it also can be thought as a set of data pairing probabilities and points that are arbitrarilly assigned to spatial-temporal patterns.

Hey, I think I am beginning to understand this stuff.

Which brings me back to one of those basic science and philosophy questions that made me interested in machine understanding in the first place. If everything is abstract, how do we (the HTM or a living human brain) get the picture of the world that seems all so familiar to us? If all the world does it produce neuronal impulses in our bodies, what makes red different from blue, and the junk on my desk resolve easily into envelopes, pens, gadgets, fake wood patterns and a host of other things?

Next: Why is Time Necessary to Learn?