Data-Driven Approaches towards Malicious Behavior Modeling

The safety, reliability and usability of web platforms are often compromised by malicious entities, such as vandals on Wikipedia, bot connections on Twitter, fake likes on Facebook, and several more. Computational models developed with large-scale real-world behavioral data have shown significant progress in identifying these malicious entities. This tutorial discusses three broad directions of state-of-the-art data-driven methods to model malicious behavior: (i) feature-based algorithms, in which distinguishing behavioral features are proposed to predict the malicious users; (ii) spectral-based algorithms, which have been widely used in settings of directed graphs, undirected graphs, and bipartite graphs such as "who-follows-whom" Twitter data and "who-likes-what" Facebook data; and (iii) density-based algorithms, which efficiently look for suspicious, highly-dense components in multi-dimensional behavioral data. This tutorial will introduce the details of the general algorithms from the above three classes that can be applied to any platform and dataset. Link to tutorial: http://www.meng-jiang.com/tutorial-kdd17.html