A Top-down Approach to Feature Selection in Reinforcement Learning

Feature selection is an important problem in many areas of machine learning including reinforcement learning (RL). A possible approach to feature selection is to solve the machine learning problem in a high dimensional feature space in the hope that relevant features lie there. However, this approach may suffer from overfitting and have poor prediction performance. Two methods that have been used in regression to overcome this problem are regularization (adding l-2 and/or l-1 penalization terms to the objective function) and random projections (solving the problem in a randomly generated low dimensional space). In this talk, we study the use of these two methods in value function approximation in RL In particular, we study the widely-used least-squares temporal difference (LSTD) learning algorithm. We first provide a thorough theoretical analysis of LSTD with random projections and derive performance bounds for the resulting algorithm. We then analyze the performance of Lasso-TO, a modification of LSTD in which the projection operator is defined as a Lasso problem.