Menu

Thompson Sampling for the Duelling Bandits Problem

calendar icon May 28, 2013 3350 views
video thumbnail
Pause
Mute
speed icon
speed icon
0.25
0.5
0.75
1
1.25
1.5
1.75
2

In surprisingly many situations, absolute rewards are not available (or nonstationary) while relative preferences are easy to collect (or stable). This variation of the bandit problem is known at the duelling bandits (or, dueling bandits in the US; see http://www.cs.cornell.edu/people/tj/publications/yue_etai_09a.pdf). My talk will cover our preliminary work developing a Thompson sam piing algorithm for the duelling (or dueling) bandit problem.

RELATED CATEGORIES

MORE VIDEOS FROM THE SAME CATEGORIES

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.