It’s time for us to finally show off our Atari Pong demo! Our Sparse Predictive Hierarchies (SPH, as implemented in OgmaNeo) are now able to play Atari games. Our first test is Pong, a test of reinforcement learning from pixel data.
If you need a refresher on how the prediction-only version of OgmaNeo2 works (upon which the following is based), see this slideshow presentation.
Our agent has a capability setting it apart from regular deep learning (DL) approaches: It is fast enough to run in real-time (60fps) with learning enabled on a single CPU core of a Raspberry Pi 4. It owes this capability primarily to the usage of online/incremental learning and high levels of sparsity (active ratio ~6%).
The agent we used to accomplish this uses the “routing-based method” of reinforcement learning described in this post.
Some information about the agent used:
- Image is grayscale, downsampled and cropped to 64×64.
- Image pre-encoder: Random projection followed by inhibition. Size: 10x10x16 neurons (WxHxColSize). Connectivity radius: 5
- 2 layers of encoders, trained to minimize reconstruction error. Size: 5x5x16
- Routed network and encoders have same connectivity radius, 5
- 3 layers total, 2400 cells, about 1 million synapses
Note: The data gathered here was taken from an agent trained on a regular desktop CPU (still single core). We know it runs and learns fine on a Pi from previous runs as mentioned previously, but data was gathered from the desktop version because it was more convenient to do so. If people are interested, we can get total runtimes from a version run entirely on the Pi.
Here is a graph of reward vs episodes:
The graph has a bit of a strange shape, we are looking into why this is.
We are still in the process of cleaning up the code for full release, but if you would like to try it early, you can checkout the branch Cire_CPU_Route_SARSA_Reverse_ReconEnc from our development fork.
Until next time!