April 4 , 2010
Understanding the Bitworm NuPIC HTM Example Program, Part 3
Spatial and Temporal Pool Overview
In Understanding Bitworm Part 2 I wrote: "Among the other parameters of CreateNode you can see spatialPoolerAlgorithm and temporalPoolerAlgorithm. I don't think I having used "pooling" yet. Remember I wrote about quantization points? [See How do HTMs Learn?] There are a number of available points both for spatial and temporal patterns in the unsupervised nodes. They need to be populated, and they may change during the learning phase. Pooling appears to be NuSpeak for this process; a pooler algorithm is the code that matches up incoming data to quantization points."
To learn about the pooling algorithms I went to the Numenta Node Algorithm Guide, which is not at the Numenta web site, but installs with NuPIC under \Program Files\Numenta\nupic-1.7.1\share\doc\NodeAlgorithmsGuide.pdf.
There are two node types implementa the NuPIC learning algorithms:
Some confusion might exist because in more general Numenta discusions a node is treated as a single entity, but both the spatial and the temporal node are needed to create a functioning general node. When the unsupervised node in Bitworm is created with CreateNode(Zeta1Node,...), in effect both a SpatialPoolerNode and a TemporalPoolerNode are created to get full functionality. They refer to both node types being in the same level of the HTM hierarchy. But with you can design more complicated patterns by arranging SpatialPoolerNode and TemporalPoolerNode in an HTM as needed, rather than always pairing them on a level.
"Spatial pooling can be thought of as a quantization process that maps a potentially infinite number of input patters to a finite number of quantization centers." Which in other lit Numenta calls quantization points. Data, in our HTM world, has a spatial aspect. This might not be change along a spatial dimension; space has a more general sense. For instance, the space might be a range of voltages, or sets of voltages from an EKG, for instance. Spatial data usually varies so complexly that we are only interested in the data that is created by objects, or causes. Spatial pooling groups the data into a limited number of causes (or guesses about causes).
Temporal pooling does the same thing with the patterns (objects) identified by the spatial pooler over time sequences. "If pattern A is frequently followed by pattern B, the temporal pooler can assign them to the same group."
A group of nodes forming an HTM level may be able to form invariant representations of objects by combining spatial and temporal pooling. If it can, it passes these representation up the hierarchy.
Once learning is achieved the nodes can be used for inference: they can identify new data as containing patterns that have already been learned.
For now I will focus on the learning phase, since the inference phase is relatively easy to understand if you understand how learning takes place.
I just realized the paper I am reading does not actually give the algorithms used. However, the key algorithm is probably related to the maxDistance parameter. Distance here could be ordinary distance, but it is more likely to be distance within a generalized, possible many-dimensional, heterogeneous pattern space. All kinds of problems leap to mind for writing such a generalized algorithm. I would bet that space/data specific algorithms would really help here (sound vs. sight vs. spatial orientation of human fingers), but perhaps if the quantification is always done before the data is fed in, it is just a matter of matching numbers. Anyway, if you have a distance function, you can group the spatial patterns as falling around a set of centers. These centers are your quantization points. As discussed elsewhere these points are flexible; if a lot of patterns fall close to each other, you might want to tighten up the distance parameter because otherwise you don't use all your allocation of quanization points. That should happen automatically, but either it doesn't, so you need to set the maxDistance parameter, or it does but you still have the option of disagreeing with the automatic or default settings.
Your number of quantization points is set by maxCoincidenceCount. "Storing too few coincidence patterns can result in loss of accuracy due to loss of information. Storing too many coincidence patterns can result in lower generalization and longer training times."
You can also set the sigma parameter. Here's another insight into the algorithm: "each input pattern is compared to the stroed patterns assuming that the stored patterns are centers of radial basis functions with Gaussian tuning. The sigma parameter specifies the standard deviation of the Gaussian [distribution]." So this would work, along with maxDistance, in matching incoming data patterns to existing quantization points.
The clonedNodes parameter allows a set of spatial nodes to use the same coincidence patterns. This allows all the nodes in a level to detect the same causes. In vision that could be moving lines, spots, etc.
The spatial pooler nodes take inputs with the bottomUpIn parameter. The spatial pattern outputs in inference mode are in bottomUpOut; outputs in learning mode go to a temporal pooler.
Temporal pooling has more options than spatial pooling, in particular offering parameters for both first-order and higher-order learning.
Your number of temporal groups, or time quantization points, is set by requentedGroupCount.
You can select a variety of algothims to use to compute output probabilities with the temporalPoolerAlgorithm parameter, but it has no impact on the learning algorithm.
There are a number of sequencer parameters that allow control of the of the algorithm. sequencerWindowCount allows for multiple stages of discovery (the default is 10). sequencerWindowLength allows segmentation of the input sequence to look for patterns. sequencerModelComplexity apparently allows you to adjust for how the recognizable patterns are balanced between the spatial and temporal dimensions. Some objects produce mainly spatial patterns, others mainly temporal, and most combine the two to a greater degree.
As with SpatialPoolerNode, you can clone the nodes if you desire. bottomUpIn takes the data in from one or more spatial pooler nodes. bottomUpOut is the resulting vector of real numbers representing "the likelihood that the input belongs to each of the temporal groups of this node."
In addition to parameters, TemporalPoolerNode takes a command: predict, but it works only in inference mode.
Despite not revealing the details of the algorithms, the Guide, plus the previous materials I read, gave me a good overview of what the algorithms need to achieve. I am pretty sure that I would write algorithms that do approximately what the Numenta pooling algorithms do, but since they have been playing with this for years, I would rather catch up by examinging the code inside the Numenta classes.
See also: More on Internal Operations of Nodes