Combining a co-occurrence-based and a semantic measure for entity linking
One key feature of the Semantic Web lies in the ability to link related Web resources. However, while relations within particular data sets are often well-defined, links between disparate data sets and corpora of Web resources are rare. The increasingly widespread use of cross-domain reference data sets, such as Freebase and DBpedia for annotating and enriching data sets as well as documents, opens up opportunities to exploit their inherent semantic relationships to align disparate Web resources. In this paper, we present a combined approach to uncover relationships between disparate entities which exploits (a) graph analysis of reference data sets together with (b) entity co-occurrence on the Web with the help of search engines. In (a), we introduce a novel approach adopted and applied from social network theory to measure the connectivity between given entities in reference data sets. The connectivity measures are used to identify connected Web resources. Finally, we present a thorough evaluation of our approach using a publicly available data set and introduce a comparison with established measures in the field.