A PAC-Bayesian Analysis of Dropouts
Intuitively, a neural network that is robust to dropout perturbations should have better generalization properties - it should perform better on novel inputs. Stochastic model perturbation is the fundamental concept underlying PAC-Bayesian generalization theory. This talk will briefly summarize PAC-Bayesian generalization theory and give a regularization bound for a simple form of dropout training as a straightforward application. For a regularization bound involving an L2 penalty for model weights, dropouts reduce the regularization penalty by a factor of 1-alpha where alpha is the dropout rate. The bound then expresses a trade-off between the dropout rate and the training loss. While this regularization bound in intriguing, it may not be the right analysis. An alternative analysis involves variance reduction - the standard motivation for bagging. There are good reasons to believe that a certain general PAC-Bayes variance bound is significantly tighter than the general PAC-Bayes regularization bound. Unfortunately the variance bound is opaque - it does not involve explicit regularization and is difficult to compare with regularization bounds. Also, unlike regularization bounds, there is no obvious method for designing algorithms that minimize the variance bound. A compelling variance-based PAC-Bayesian analysis of dropouts remains an open problem.