Hierarchical label queries with data-dependent partitions
0.25
0.5
0.75
1.25
1.5
1.75
2
Given a joint distribution $P_{X, Y}$ over a space $\X$ and a label set $\Y=\braces{0, 1}$, we consider the problem of recovering the labels of an unlabeled sample with as few label queries as possible. Recovered labels can be passed to a passive learner, thus turning the procedure into an active learning approach. We analyze a family of labeling procedures based on a hierarchical clustering of the data. While such labeling procedures have been studied in the past, we provide a new parametrization of $P_{X, Y}$ that captures their behavior in general low-noise settings, and which accounts for data-dependent clustering, thus providing new theoretical underpinning to practically used tools.