Unreasonable Effectiveness of Learning Artificial Neural Networks

Deep networks are some of the most widely used tools in data science. Learning is in principle a hard problem in these systems, but in practice heuristic algorithms often find solutions with good generalization properties. We propose an explanation of this good performance in terms of a novel large-deviation measure: we show that there are regions of the optimization landscape which are both robust and accessible, and that their existence is crucial to achieve good performance on a class of particularly difficult learning problems. Building on these results, we introduce basic algorithmic schemes which improve existing optimization algorithms and provide a framework for further research on efficient learning for huge data sets and for novel computational technologies.