Thompson Sampling for the Duelling Bandits Problem
In surprisingly many situations, absolute rewards are not available (or nonstationary) while relative preferences are easy to collect (or stable). This variation of the bandit problem is known at the duelling bandits (or, dueling bandits in the US; see http://www.cs.cornell.edu/people/tj/publications/yue_etai_09a.pdf). My talk will cover our preliminary work developing a Thompson sam piing algorithm for the duelling (or dueling) bandit problem.