Processing social media data: can we circumvent the Tower of Babel?

Social media are known to be a diverse and rich source of information for various areas of research. However, they pose a series of processing challenges due to the linguistic and cultural diversity of their users. Processing social media texts with standard language technologies has an error rate much higher than that on standard texts. Furthermore, researchers are regularly in need of additional user data like their sociodemographic information. In the first part of my talk I will present a series of technology adaptations for processing varying language production, while in the second part I will overview some experiments on language-independent user profiling such as user type identification and gender prediction.