Retrieve (and Leverage) the Inner Graph Behind the Data
Session Chair: Gianluca Demartini Many challenging data integration problems, in particular in data journalism, feature heterogeneity at the level of the schema and the data model. To overcome the heterogeneity, we have shown how data of many (semi)structured models can be converted in fine-granularity graphs, enriched and densified with the help of information extraction. Such fine-grained graphs, however, are hard to grasp for non-technical users. To help them get acquainted with a dataset, we devised an abstraction method, which identifies, in fine-granularity data graphs, structured objects endowed with an internal structure, and relationships between them. Given a semistructured dataset, we automatically produce an Entity-Relationship style diagram; in contrast with traditional E-R models, our entities may feature deep nesting, reflecting the nested and possibly recursive structure present in some data models. We thus obtain an automated way of "rescuing" the conceptual model, which we argue is best viewed as a graph, behind any application dataset. We then describe automatic techniques for finding the most interesting paths connecting entities in a dataset.