Evolutionary Hierarchical Dirichlet Processes for Multiple Correlated Time-varying Corpora
Mining cluster evolution from multiple correlated time-varying text corpora is important in exploratory text analytics. In this paper, we propose an approach called evolutionary hierarchical Dirichlet processes~(EvoHDP) to discover interesting cluster evolution patterns from such text data. We formulate the EvoHDP as a series of hierarchical Dirichlet processes~(HDP) by adding time dependencies to the adjacent epochs, and propose a cascaded Gibbs sampling scheme to infer the model. This approach can discover different evolving patterns of clusters, including emergence, disappearance, evolution within a corpus and across different corpora. Experiments over synthetic and real-world multiple correlated time-varying data sets illustrate the effectiveness of EvoHDP on discovering cluster evolution patterns.