Hoeffding and Bernstein Races for Selecting Policies in Evolutionary Direct Policy Search

Uncertainty arises in reinforcement learning from various sources, and therefore it is necessary to consider statistics based on several roll-outs for evaluating behavioral policies. We add an adaptive uncertainty handling based on Hoeffding and empirical Bernstein races to the CMA-ES, a variable metric evolution strategy proposed for direct policy search. The uncertainty handling adjusts individually the number of episodes considered for the evaluation of a policy. The performance estimation is kept just accurate enough for a sufficiently good ranking of candidate policies, which is in turn sufficient for the CMA-ES to find better solutions. This increases the learning speed as well as the robustness of the algorithm.

Hoeffding and Bernstein Races for Selecting Policies in Evolutionary Direct Policy Search

Christian Igel

MORE VIDEOS FROM THE EVENT

MORE VIDEOS FROM THE SAME CATEGORIES