Dyna(k): A Multi-Step Dyna Planning
Dyna planning is an efficient way of learning from real and imaginary experience. Existing tabular and linear Dyna algorithms are single-step, because an "imaginary" feature is predicted only one step into the future. In this paper, we introduce a multi-step Dyna planning that predicts more steps into the future. Multi-step Dyna is able to figure out a sequence of multi-step results when a real instance happens, given that the instance itself, or a similar experience has been imagined (i.e., simulated from the model) and planned. Our multi-step Dyna is based on a multi-step model, which we call the λ-model. The λ-model interpolates between the onestep model and an innite-step model, and can be learned efficiently online. The multistep Dyna algorithm, Dyna(k), uses the λ- model to generate predictions k steps ahead of the imagined feature, and applies TD on this imaginary multi-step transitioning.