Menu

Crowdsourcing Taxonomies

calendar icon Jul 4, 2012 4876 views
split view icon
video icon
presentation icon
video with chapters icon
video thumbnail
Pause
Mute
speed icon
speed icon
0.25
0.5
0.75
1
1.25
1.5
1.75
2

Taxonomies are a useful mechanism to organize, evaluate, and search web content. As such, many popular classes of web applications, from product categorization, similar-product comparative pricing, localized services, to vertical or enterprise search, utilize them. However, their manual generation and maintenance by experts is a time-costly and cumbersome procedure, often resulting in platform-dependent and static vocabularies. Hence lots of research has been focusing currently on more flexible and dynamic methods to develop them, as evidenced for example by the huge interest of folksonomies within the social media realm. We propose a new approach for constructing taxonomies. Our idea stems from the increased human involvement and desire to provide tags and annotate web content (e.g., in social media and product categorization applications). We define the required input from human users in the form of explicit structural information; that is, supertype-subtype relationships between concepts. Humans have a good understanding of such relationships. In this way, we harvest, via common annotation practices, the collective wisdom of users with respect to the (categorization of) web content they share and access. We further define the principles upon which crowdsourced taxonomy construction algorithms should be based. We show that the resulting problem is NP-Hard. We provide heuristic algorithms and relevant optimizations that aggregate human input, resolving conflicting input, and produce taxonomies. Our algorithm's evaluation is based on real-world crowdsourcing experiments (where real users provide such information) and on real-world taxonomies.

RELATED CATEGORIES

MORE VIDEOS FROM THE SAME CATEGORIES

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.