Menu

Learning About Sensorimotor Data

calendar icon Jan 25, 2012 5975 views
split view icon
video icon
presentation icon
video with chapters icon
video thumbnail
Pause
Mute
speed icon
speed icon
0.25
0.5
0.75
1
1.25
1.5
1.75
2

Temporal-difference (TD) learning of reward predictions underlies both reinforcement-learning algorithms and the standard dopamine model of reward-based learning in the brain. This confluence of computational and neuroscientific ideas is perhaps the most successful since the Hebb synapse. Can it be extended beyond reward? The brain certainly predicts many things other than reward---such as in a forward model of the consequences of various ways of behaving---and TD methods can be used to make these predictions. The idea and advantages of using TD methods to learn large numbers of predictions about many states and stimuli, in parallel, have been apparent since the 1990s, but technical issues have prevented this vision from being practically implemented...until now. A key breakthrough was the development of a new family of gradient-TD methods, introduced at NIPS in 2008 (by Maei, Szepesvari, and myself). Using these methods, and other ideas, we are now able to learn thousands of non-reward predictions in real-time at 10Hz from a single sensorimotor data stream from a physical robot. These predictions are temporally extended (ranging up to tens of seconds of anticipation), goal oriented, and policy contingent. The new algorithms enable learning to be off-policy and in parallel, resulting in dramatic increases in the amount that can be learned in a given amount of time. Our effective learning rate scales linearly with computational resources. On a consumer laptop we can learn thousands of predictions in real-time. On a larger computer, or on a comparable laptop in a few years, the same methods could learn millions of meaningful predictions about different alternate ways of behaving. These predictions in aggregate constitute a rich detailed model of the world that can support planning methods such as approximate dynamic programming.

RELATED CATEGORIES

MORE VIDEOS FROM THE SAME CATEGORIES

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.