Unsupervised Discovery of Structure, Succinct Representations and Sparsity
We describe a class of unsupervised learning methods that learn sparse representations of the training data, and thereby identify useful features. Further, we show that deep learning (multilayer) versions of these ideas, ones based on sparse DBNs, learn rich feature hierarchies, including part-whole decompositions of objects. Central to this is the idea of "probabilistic max pooling", which allows us to implement convolutional DBNs at a large scale, while maintaining probabilistically sound semantics. In the case of images, at the lowest level this method learns to detect edges; at the next level, it puts together edges to form "object parts"; and finally, at the highest level puts together object parts to form whole "object models". The features this method learns are useful for a wide range of tasks, including object recognition, text classification, and audio classification. We also present the result of comparing a two-layer version of the model (trained on natural images) to visual cortical areas V1 and V2 in the brain (the first and second stages of visual processing in the cortex). Finally, we'll conclude with a discussion on some open problems and directions for future research.