If Only We Had Tracked Something Like This Before

We want algorithms that can tell us what went where in a video. Tracking is hard because each situation is different, featuring a different camera operator, and subject(s) whose appearance, motion, and setting are in a novel configuration. While each video may be pixel-wise unique, we hypothesize that the observed motions are quite similar, at least for tracking purposes. We propose that better tracking can be achieved by learning to automatically associate different videos (or parts) with different algorithms. Instead of seeking an elusive one-size-fits-all tracking strategy (often in the form of an energy function), we advocate keeping multiple strategies, but recognizing when/where to use each. We demonstrate this approach for the problems of optical flow and interest-point tracking.