OgmaNeo Overview and Video Prediction

Here is an overview of the Feynman Machine architecture used in the OgmaNeo library, followed by an example for video prediction (recall).

For the original Feynman Machine paper, see https://arxiv.org/abs/1609.03971.

A High Level Look at the Feynman Machine

The Feynman Machine is a hierarchical sequence prediction algorithm that functions on the basis of coupled dynamical systems.

The Feynman Machine is implemented in the OgmaNeo C++ library, with bindings to Python and Java. Demos for the C++ version can be found here.

What makes the Feynman Machine unique from other deep learning architectures is its focus on fully real-time spatio-temporal online learning, local computation (no backpropagation), and speed.

Encoders and Decoders

A Feynman Machine hierarchy can take many forms, but for now we will focus on a simple stack of 2D layers. Each layer consists of mainly two parts: An encoder and a decoder.

The encoder attempts to model the underlying dynamical systems observed in the data. It produces a sparse code that represents the current spatio-temporal state.

The decoder maps from the state of the encoder to either the state of the underlying dynamical system (input), or to the advanced state (next timestep) of the encoder. It does this by taking both the current layer encoder state and the next higher layer decoder state into account. This essentially forms a prediction of the next timestep, which can be used as the output of the model, or fed in to a lower layer to improve the predictions of that layer.

Information flows in two directions: Up and down. We therefore separate processing into an up pass and a down pass. In the up pass (encoders pass), we attempt to model the inputs by extracting sparse spatio-temporal features. Each encoder extracts features of the encoder below it:

neorlhierarchyimg

Where the red states are active units in the encoder (binary and sparse).

In the down (decoding) pass, we then combine higher layer predictions with current layer state to predict the next state of the encoder (or the input of the encoder, depending on the setup). This allows us to avoid backpropagation of errors, since we know the targets of predictions at each layer are simply the next timestep (t + 1) of the state of the encoder.

In most of our experiments, a simple linear combination will suffice for the decoder, trained with the perceptron delta rule.

The encoder, however, is a bit trickier. We have experimented with many encoder types and architectures, and have included some of the best so far in the OgmaNeo release.

For this article we will focus on the Chunk Encoder, a particularly general-purpose encoder with desirable properties.

The Chunk Encoder

The Chunk Encoder is essentially a grid of tiled (chunked) self-organizing maps (SOMs) with temporal modeling capabilities.

This means that each chunk (tile) is self-contained, and produces a single active bit in its representation (the best matching unit in the SOM).

Each SOM looks sort of like this:

chunkrender

Where the green and grey units are the hidden units, and the blue units are the inputs. The network is fully connected within a single chunk, but sparsely connected (local radii) outside of a chunk.

The different shades of green represent the influence the winning (red, best matching unit (BMU)) neuron has on its neighbors. When influenced, a neuron drives its synapses towards the currently observed pattern. The competitive-collaborative nature of the SOM ensures that the resulting representation is both sparse and does not contain “dead” (unused) units.

To represent the evolution of its inputs over time, each chunk has two mechanisms: A second recurrent input layer, and per-input traces. Given enough time and units, the recurrent connections are sufficient to produce a good spatiotemporal code of the input. However, it doesn’t have “multi-timestep credit assignment” – it needs to propagate events back in time one step at a time. To address this problem, we augment the chunk with per-input memory traces.

The per-input memory traces are simply running averages of the inputs at that time. Generally, different input sequences will usually produce unique running averages at each timestep, so these traces form a simple form of one-shot recurrent memory. These traces store a complete history in compressed form, and can be exploited without propagating credit back in time.

Chunks are put together in 2D layers in the OgmaNeo library, however different dimensions are also possible.

Encoder and Decoder Combination

When combining encoders and decoders, we can use some additional trickery to improve performance of the system. One trick in particular relates to how information is passed upwards through the encoders.

When a lower layer is able to predict the sequence it sees below it reliably, we don’t really need to send any information upwards, since in the downwards pass the state of the current layer is enough to fully predict the input. On the other hand, if the layer is not able to fully predict its input, it makes sense to only propagate the errors (mistakes) the layer made to the next higher layer.

This results in a form of predictive coding. Each encoder no longer receives the raw state of the encoder below it as input, but rather the difference in prediction and outcome. This means that when a prediction is correct, the encoder will receive only 0’s. Otherwise, it will receive an error signal.

Predictive coding allows us to not only reduce the amount of information flowing up the hierarchy, but also allows us to make sure that each layer only learns to predict what is necessary for an optimal prediction.

Video Prediction Example

In this portion of the article, we will go over the video prediction example available in the OgmaNeo demos repository. This is one of the simplest demos, requiring only that the hierarchy effectively recall the input sequence. Nevertheless, it has been a vital demo for the development of the algorithm. Future demos will be added that better showcase the generalization characteristics of the Feynman Machine, but for a simple tutorial the video prediction will suffice.

To recall a sequence of video frames, we want to be able to predict each frame given the previous frame. This way, we can feed predictions back into the hierarchy as assumed correct input, allowing us to “replay” what the hierarchy has seen.

We will now go over the hierarchy creation code, in C++. The rest of the code is beyond the scope of this example, as it relies heavily on the SFML and OpenCV libraries to load and display videos. We assume OgmaNeo has been installed by following the instructions in the included README.

[dt_code]

// ————————— Create the Hierarchy —————————

std::shared_ptr<ogmaneo::Resources> res = std::make_shared<ogmaneo::Resources>();

res->create(ogmaneo::ComputeSystem::_gpu);

ogmaneo::Architect arch;
arch.initialize(1234, res);

[/dt_code]

Here we first acquire the resources necessary to create a Feynman Machine (OpenCL context and kernels), followed by the creation of an Architect. The Architect provides a simple interface to creating the most common forms of hierarchies. A lower-level interface is also available for finer control, but we will use the Architect interface for now.

The Architect is initialized with the resources and a seed.

We can then start adding layers using the Architect. Layers are 2D, and we want color video, so we will add 3 input layers for RGB components.

[dt_code]

// 3 input layers for RGB
arch.addInputLayer(ogmaneo::Vec2i(width, height))
.setValue(“in_p_alpha”, 0.02f)
.setValue(“in_p_radius”, 8);

arch.addInputLayer(ogmaneo::Vec2i(width, height)))
.setValue(“in_p_alpha”, 0.02f)
.setValue(“in_p_radius”, 8);

[/dt_code]

Each layer adding call returns a ParameterModifier that allows one to modify properties of that layer. Here we set two properties per layer, the input prediction alpha (learning rate) and the input prediction radius (receptive field size). For a full list of parameters, please refer to the README included in OgmaNeo.

We can then add higher layers (encoder-decoder pairs) like so:

[dt_code]

for (int l = 0; l < 4; l++)
arch.addHigherLayer(ogmaneo::Vec2i(60, 60), ogmaneo::_chunk)
.setValue(“sfc_chunkSize”, ogmaneo::Vec2i(6, 6))
.setValue(“sfc_ff_radius”, 8)
.setValue(“hl_poolSteps”, 2)
.setValue(“sfc_numSamples”, 2)
.setValue(“sfc_weightAlpha”, 0.01f)
.setValue(“sfc_biasAlpha”, 0.1f)
.setValue(“sfc_gamma”, 0.92f)
.setValue(“p_alpha”, 0.04f)
.setValue(“p_beta”, 0.08f)
.setValue(“p_radius”, 8);

[/dt_code]

Most of these parameters can be left at their defaults, but we set them here for clarity.

In the above, we created 4 layers using chunk encoders. Each layer is 60×60 units, and each chunk is 6×6, meaning there are 10×10 chunks.

We then generate the hierarchy:

[dt_code]

// Generate the hierarchy
std::shared_ptr<ogmaneo::Hierarchy> h = arch.generateHierarchy();

// Input and prediction fields for color components
ValueField2D inputFieldR(ogmaneo::Vec2i(rescaleRT.getSize().x, rescaleRT.getSize().y), 0.0f);
ValueField2D inputFieldG(ogmaneo::Vec2i(rescaleRT.getSize().x, rescaleRT.getSize().y), 0.0f);
ValueField2D inputFieldB(ogmaneo::Vec2i(rescaleRT.getSize().x, rescaleRT.getSize().y), 0.0f);
ValueField2D predFieldR(ogmaneo::Vec2i(rescaleRT.getSize().x, rescaleRT.getSize().y), 0.0f);
ValueField2D predFieldG(ogmaneo::Vec2i(rescaleRT.getSize().x, rescaleRT.getSize().y), 0.0f);
ValueField2D predFieldB(ogmaneo::Vec2i(rescaleRT.getSize().x, rescaleRT.getSize().y), 0.0f);

[/dt_code]

We also created several value fields. These are used as temporary buffers to supply the hierarchy with input and to obtain predictions.

Assuming we can obtain the video frames (see OgmaNeoDemos for how we did it using SFML and OpenCV), we can then train on the video in an online fashion:

[dt_code]

std::vector<ogmaneo::ValueField2D> inputVector = { inputFieldR, inputFieldG, inputFieldB };
h->simStep(inputVector, true);

predFieldR = h->getPredictions()[0];
predFieldG = h->getPredictions()[1];
predFieldB = h->getPredictions()[2];

[/dt_code]

Here, we provided the RGB inputs, and then obtained the RGB next-timestep predictions.

Finally, to recall the video, we simply need to feed the predictions back in to the hierarchy as inputs:

[dt_code]

std::vector<ogmaneo::ValueField2D> inputVector = { predFieldR, predFieldG, predFieldB };
h->simStep(inputVector, false);

predFieldR = h->getPredictions()[0];
predFieldG = h->getPredictions()[1];
predFieldB = h->getPredictions()[2];

[/dt_code]

It is important to note that during training, frames were presented sequentially. Most deep learning systems are based on backpropagation and therefore require i.i.d. samples. However, due to the online learning nature of our system, we can learn off of inputs sequentially without any sort of history buffer.

Below is a video of the system recalling various video sequences:

For the full code, please visit the OgmaNeoDemos repository linked at the beginning of this article.

Conclusion

We hope that this new architecture will find uses in various fields. It is not intended to replace current deep learning systems on tasks where they excel, but rather to operate on a different problem field. We tackle online learning problems where backpropagation-based methods are either too slow or are not flexible enough. Our system typically takes in the order of minutes to train and can continue training while receive data.

We encourage users to experiment and hopefully contribute to the development of Feynman Machines, as we believe that this problem domain has a lot of potential use. We will continue developing the Feynman Machine theory and the OgmaNeo software to support users.

OgmaNeo Overview and Video Prediction

A High Level Look at the Feynman Machine

Encoders and Decoders

The Chunk Encoder

Encoder and Decoder Combination

Video Prediction Example

Conclusion

Related

Leave a Reply Cancel reply

A High Level Look at the Feynman Machine

Encoders and Decoders

The Chunk Encoder

Encoder and Decoder Combination

Video Prediction Example

Conclusion

Share this:

Related

Leave a Reply Cancel reply