Generalization and Exploration via Value Function Randomization

Effective reinforcement learning calls for both efficient exploration and extrapolative generalization. I will discuss a new approach to exploration which combines the merits of provably efficient tabula rasa reinforcement learning algorithms, such as UCRL and PSRL, and algorithms that accommodate value function generalization, such least-squares value iteration and temporal-difference learning. The former require learning times that grow with the cardinality of the state space, whereas the latter tend to be applied in conjunction with inefficient exploration schemes such as Boltzmann and epsilon-greedy exploration. Our new approach explores through randomization of value function estimates.