In a standard Markov decision process (MDP), rewards are assumed to be precisely known and of quantitative nature. This can be a too strong hypothesis in some situations. When rewards can really be