Detecting Erroneous Identity Links on the Web using Network Metrics
Detecting Erroneous Identity Links on the Web using Network Metrics
0.25
0.5
0.75
1.25
1.5
1.75
2
Although best practices for publishing Linked Data encourage the re-use of existing IRIs, multiple names are often used to denote the same thing. Whenever multiple names are used, owl:sameAs statements are needed in order to align them. Studies that date back as far as 2009, have observed multiple misuses of owl:sameAs links. As a result, alignment of Linked Data is currently broken, since many owl:sameAs links are erroneous, even introducing inconsistencies. In this paper, we show how network metrics such as the community structure of the owl:sameAs graph can be used to detect such (possibly) erroneous statements. We evaluate our method on a subset of the LOD Cloud that contains over 558M owl:sameAs statements.