The exploration-exploitation dilemma has been an intriguing and unsolved problem within the framework of reinforcement learning. Optimism in the face of uncertainty and model building play central rol