Regularization and Computations: Early stopping for Online Learning

Early stopping is one of the most appealing heuristics when dealing with big data, since the computational resources required for learning are directly linked to the desired generalization properties. Interestingly, the theoretical foundations of learning with early stopping have only recently been developed and only for the case of the classical batch gradient descent. In this talk, we discuss and analyze the potential impact of early stopping for online learning in a stochastic setting. More precisely, we study the estimator defined by the incremental gradient descent of the (unregularized) empirical risk and show that it’s universally consistent when provided with a universal step-size, and a suitable early stopping rule. Our results shed light on the need of considering several passes over the data (epochs) in online learning.