Learning Dynamic Locomotion Skills for Terrains with Obstacles

Using reinforcement learning to develop motor skills for articulated figures is challenging because of state spaces and action spaces that are high dimensional and continuous. In this work, we learn control policies for dynamic gaits across terrains having sequences of gaps, walls, and steps. Results are demonstrated using physics-based simulations of a 21 link planar dog and a 7-link planar biped. Our approach is characterized by a number of features, including: non-parametric representation of the value function and the control policy; value iteration using batched positive-TD updates; localized epsilon-greedy exploration; and an action parameterization that is tailored for the problem domain. In support of the nonparametric representation, we further optimize for a task-specific distance metric. The policies are computed offline using repeated iterations of epsilon-greedy exploration and value iteration. The final control policies then run in real time over novel terrains. We evaluate the impact of the key features of our skill learning pipeline on the resulting performance.