Skip to content
Ogma Intelligent Systems Corp
Menu
  • Technology Description
  • OgmaNeo
  • Press
  • Contact
  • GitHub
  • YouTube
Menu

Unsupervised Behavioral Learning (UBL) revisited

Posted on July 12, 2021October 18, 2024 by Eric Laukien

In a previous post, I described an alternative to reinforcement learning (RL) called Unsupervised Behavioral Learning (UBL). In short, instead of maximizing rewards, it seeks to match its current state to some goal state (which is spatio-temporal).

We have decided to return to the idea with a new real-world demonstration. This latest iteration of UBL is based on AOgmaNeo, the most up-to-date implementation of Sparse Predictive Hierarchies (SPH) as of the time of writing. Along with better all-around performance through AOgmaNeo, we also updated the UBL algorithm a bit. We are still working out the best version, but we already have some interesting results to share.

In the video below, we trained the latest version of our “World’s Smallest Self-Driving Car” (v4) to act as a “rat” in a simple cardboard T-maze. The walls have some colored markings, which aside from helping the robot identify its location also serve as goal states. After driving the robot around the maze by hand semi-randomly, we can move the robot to a recognizable landmark (in front of a colored marking) and save the top-level state of the hierarchy as the goal state (a CSDR) by pressing a button. If you don’t know what a CSDR is, we recently made a guide for the regular edition (master branch) of AOgmaNeo. The rat robot can then be moved to some other location in the maze, and it will try to return to the goal state. As is usual with SPH, all processing happens on-board the Pi Zero, including online training. The only sensor the rat uses is a small fish-eye camera.

Since goal states come from the top-most CSDR in the SPH hierarchy, they are spatio-temporal. This means they can capture more than static states, but entire behaviors as well. In this case, this feature isn’t really needed, as we want the robot to just sit still in front of the colored markings. We do then however have to make sure the robot has been sitting still for a bit when determine the goal CSDR (just a few seconds of sitting still is enough).

We are also working on an adapter for using UBL as a regular RL agent. We will likely be basing it on Bandit Swarm Networks (BSN), as these are good at finding rewarding static configurations such as goal states. Hopefully that will be working properly in the next blog post!

Until next time!

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook

Related

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • AOgmaNeo Lorcan Mini robot demo + CLOgmaNeo
  • Unsupervised Behavioral Learning (UBL) revisited
  • Real2Sim with AOgmaNeo
  • Tutorial: RL with OgmaNeo2
  • Learning to walk (faster) with OgmaNeo2

Recent Comments

  • Eric Laukien on A Tutorial on OgmaNeo2
  • Eduardo on A Tutorial on OgmaNeo2
  • Eric Laukien on Beating Atari Pong on a Raspberry Pi without Backpropagation
  • Joseph Yacura on Beating Atari Pong on a Raspberry Pi without Backpropagation
  • Eric Laukien on Beating Atari Pong on a Raspberry Pi without Backpropagation

Archives

  • June 2022
  • July 2021
  • May 2021
  • June 2020
  • May 2020
  • March 2020
  • October 2019
  • August 2019
  • June 2019
  • February 2018
  • July 2017
  • June 2017
  • March 2017
  • December 2016
  • November 2016

Categories

  • OgmaNeo
  • Press

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.

To find out more, including how to control cookies, see here: Cookie Policy
© 2026 Ogma Intelligent Systems Corp | Powered by Minimalist Blog WordPress Theme