Learning Representations of Large-scale Networks

Large-scale networks such as social networks, citation networks, the World Wide Web, and traffic networks are ubiquitous in the real world. Networks can also be constructed from text, time series, behavior logs, and many other types of data. Mining network data attracts increasing attention in academia and industry, covers a variety of applications, and influences the methodology of mining many types of data. A prerequisite to network mining is to find an effective representation of networks, which largely determines the performance of downstream data mining tasks. Traditionally, networks are usually represented as adjacency matrices, which suffer from data sparsity and high-dimensionality. Recently, there is a fast-growing interest in learning continuous and low-dimensional representations of networks. This is a challenging problem for multiple reasons: (1) networks data (nodes and edges) are sparse, discrete, and globally interactive; (2) real-world networks are very large, usually containing millions of nodes and billions of edges; and (3) real-world networks are heterogeneous. Edges can be directed, undirected or weighted, and both nodes and edges may carry different semantics. In this tutorial, we will introduce the recent progress on learning continuous and low-dimensional representations of large-scale networks. This includes methods that learn the embeddings of nodes, methods that learn representations of larger graph structures (e.g., an entire network), and methods that layout very large networks on extremely low (2D or 3D) dimensional spaces. We will introduce methods for learning different types of node representations: representations that can be used as features for node classification, community detection, link prediction, and network visualization. We will introduce end-to-end methods that learn the representation of the entire graph structure through directly optimizing tasks such as information cascade prediction, chemical compound classification, and protein structure classification, using deep neural networks. We will highlight open source implementations of these techniques. Link to tutorial: https://sites.google.com/site/pkujiantang/home/kdd17-tutorial