We consider multi-label prediction problems with large output spaces under the assumption of output sparsity - that the target (label) vectors have small support. We develop a general theory for a var