Menu

Creation of Standards for Social Media Corpora: a Digital Humanities Topic Par Excellence

calendar icon Jun 6, 2017 1226 views
split view icon
video icon
presentation icon
video with chapters icon
video thumbnail
Pause
Mute
speed icon
speed icon
0.25
0.5
0.75
1
1.25
1.5
1.75
2

Even though empirical research of computer-mediated communication (CMC) has a tradition of almost two decades, there are still only very few annotated CMC/social media corpora which are available to the scientific community and the public. The major reason for that situation is the lack of standards and tools for collecting, representing, annotating and providing resources of that type. One crucial issue is the unclear legal situation w.r.t. CMC/social media data. On the example of a legal expertise sought for the integration of an existing German chat corpus into CLARIN-D, the talk will highlight this issue (according to German law) and describe how it has been handled in the project. Another crucial issue arises from the fact that, due to the distinct communicative characteristics of CMC/social media discourse, standards and tools for the representation and annotation of text corpora can not be adopted for CMC/social media corpora without modifications. The creation of standards and the adaptation of NLP tools for that new type of language resource is a digital humanities topic par excellence since (1) it focuses on data which are born digital while at the same time (2) it requires a combination of expertise from humanities and computational sciences.

RELATED CATEGORIES

MORE VIDEOS FROM THE SAME CATEGORIES

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.