Hierarchical methods have been widely explored for object recognition, which is a critical component of scene understanding. However, few existing works are able to model the contextual information