Learning with Millions of Examples and Dimensions - Competition proposal
Over the years many different classification methods have been proposed in machine learning. However it is currently very difficult to judge which method is the most efficient with respect to training time and memory requirements and classification performance, which are the practically relevant criteria. A possible explanation for this difficulty is that methods are (often) evaluated under different conditions: For instance different datasets, evaluation criteria, model parameters and stopping conditions are used. We would therefore like to organize a competition, that is designed to be fair and enables a direct comparison of current large scale classifiers. To this end we plan to provide a generic evaluation framework tailored to the specifics of the competing methods, for example for Support Vector Machine classifiers, one would in addition to test-error record the objective value of the primal problem. Providing a wide range of datasets, each of which having specific properties, like extremely sparse, dense, high or low dimensional, we propose to evaluate the methods based on the following figures: training time vs. test error, dataset size vs. test error and dataset size vs. training time. We seek help from the community to gather relevant large-scale real-world data sets and to critically review and discuss fair evaluation criteria and finally invite researchers to co-organize and to participate in this challenge.