Learning from Weakly Labeled Data

In many machine learning problems, the labels of the training examples are incomplete. These include, for example, (i) semi-supervised learning where labels are partially known; (ii) multi-instance learning where labels are implicitly known; and (iii) clustering where labels are completely unknown. In this talk, focusing on the SVM as the learner, I will describe a label generation strategy that leads to a convex relaxation of the underlying mixed integer programming problem. Computationally, it can be solved via a sequence of SVM subproblems that are much more scalable than other convex SDP relaxations. Empirical results on the three weakly labeled learning tasks above also demonstrate improved performance. (joint work with Yu-Feng Li, Ivor W. Tsang, and Zhi-Hua Zhou)