Training Structured Predictors for Novel Loss Functions
As a motivation we consider the PASCAL image segmentation challenge. Given an image and a target class, such as person, the challenge is to segment the image into regions occupied by objects in that class (person foreground) and regions not occupied by that class (non-person background). At the present state of the art the lowest pixel error rate is achieved by predicting all background. However, the challenge is evaluated with an intersection over union score with the property that the all-background prediction scores zero. This raises the question of how one incorporates a particular loss function into the training of a structured predictor. A standard approach is to incorporate the desired loss into the structured hinge loss and observe that, for any loss, the structured hinge loss is an upper bound on the desired loss. However, this upper bound is quite loose and it is far from clear that the structured hinge loss is an appropriate or useful way to handle the PASCAL evaluation measure. This talk reviews various approaches to this problem and presents a new training algorithm we call the good-label-bad-label algorithm. We prove that in the data-rich regime the good-label-bad-label algorithm follows the gradient of the training loss assuming only that we can perform inference in the given graphical model. The algorithm is structurally similar to, but significantly different from, stochastic subgradient descent on the structured hinge loss (which does not follow the loss gradient).