We have applied our OgmaNeo2 online/incremental learning software to the problem of robotic quadruped control. We learned a slightly faster policy than the hand-coded one using reinforcement learning (RL).
For this experiment, we used the Stanford Pupper robot designed by the Stanford Robotics Club – and made use of its hand-coded policy as a “starting point” for training.
The use of the hand-coded policy to initialize the reinforcement learning prevents the robot from flailing randomly and destroying itself at first, since we already are at a point in training where this doesn’t happen.
In order to do this, we needed to make some small modifications to the encoder portion of the hierarchy in order to prevent a certain “stuck” condition that can occur when starting by performing imitation learning and then moving to RL. This modification is included in the latest release of OgmaNeo2.
Here is a video of the robot and OgmaNeo2 in action.
Some more details:
- Uses an optical flow sensor mounted on the front of the robot to determine how fast it is going (to use as a reward signal).
- Runs directly on the Raspberry Pi 4 CPU, no offloading.
- Training time approx. 2 minutes
- Optimized for speed only. It may therefore turn a bit.
- The original kinematic policy did not use the IMU (accelerometer and gyro), but the RL one does.
- The version of OgmaNeo2 used the in video uses eligibility traces, which are not currently in the main OgmaNeo2 release, but the release version works as well.
By the way, in case you missed it, we recently posted a tutorial on OgmaNeo2, so you can learn to use it in your own projects!
Until next time!