Multilingual Hate Speech Modeling by Leveraging Inter-Annotator Disagreement

As social media usage increases, so does the volume of toxic content on these platforms, motivating the Machine Learning (ML) community to focus on automating hate speech detection. While modern ML algorithms are known to provide nearly human-like results for a variety of downstream Natural Language Processing (NLP) tasks, the classification of hate speech is still an open challenge, partially due to its subjective annotation, which often leads to disagreement between annotators. This paper adopts a perspectivist approach that embraces subjectivity, leveraging conflicting annotations to enhance model performance in real-world scenarios. A state-of-the-art multilingual language model for hate speech detection is introduced, trained, and evaluated using diamond standard data with metrics that consider disagreement. Various strategies for incorporating disagreement are compared in the process. Results demonstrate that the model performs equally or better on all evaluated languages compared to respective monolingual models and drastically outperforms on multilingual data. This highlights the effectiveness of multilingual and perspectivist methods in addressing the complexities of hate speech detection. The presented multilingual hate speech detection model is available at: https://huggingface.co/IMSyPP/hate_speech_multilingual.