Stochastic optimal control theory is a principled approach to compute optimal actions with delayed rewards. The use of this approach in AI and machine learning has been limited due to the computationa