Toxic language detection on social media: a critical linguistic approach to online hate speech
DOI:
https://doi.org/10.64268/jllm.v1i01.4Keywords:
Toxic Language Detection;, Critical Discourse Analysis;, Machine Learning;, Hate Speech;, Social Media;, Linguistic Power StructuresAbstract
Background: The rise of hate speech on social media, especially during the COVID-19 pandemic, poses serious threats to psychological well-being and social cohesion. While automated detection tools exist, they often lack the ability to grasp context and cultural nuances. This study explores the integration of Critical Discourse Analysis to enhance the accuracy and fairness of toxic language detection on digital platforms.
Aim: This study aims to examine toxic language on social media by integrating an automated detection method based on machine learning with Critical Discourse Analysis (CDA), in order to understand how hate speech is produced, disseminated, and normalized within digital spaces.
Method: This study employs a qualitative-critical design. Data were collected by crawling public posts on social media platforms (Twitter and Facebook) using specific keywords. The screening of toxic language was performed using a BERT-based machine learning classification model. From the automatic detection results, 200 posts were purposively selected for further analysis using CDA, focusing on text structure, discursive practices, and social practices.
Result: The results reveal that 25.67% of the 15,000 posts analyzed were classified as toxic language. The CDA analysis uncovered that much of the toxic language did not appear explicitly but was instead concealed through irony, humor, and metaphor. The most prevalent targets of hate speech were racial issues (45%), followed by religion (28%), gender (15%), and sexual orientation (12%). Social media serves not only as a medium for individual dissemination but also as an arena for the reproduction of discriminatory ideologies.
Conclusion: This study makes methodological contributions to the development of fairer and more contextual digital content moderation systems and provides a foundation for policymakers to implement more effective regulations aimed at protecting digital spaces from hate speech.
References
Ayele, A. A., Jalew, E. A., Ali, A. C., Yimam, S. M., & Biemann, C. (2024). Exploring Boundaries and Intensities in Offensive and Hate Speech: Unveiling the Complex Spectrum of Social Media Discourse. In R. Kumar, A. Kr. Ojha, S. Malmasi, B. R. Chakravarthi, B. Lahiri, S. Singh, & S. Ratan (Eds.), Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024 (pp. 167–178). ELRA and ICCL. https://aclanthology.org/2024.trac-1.17/
Badjatiya, P., Gupta, S., Gupta, M., & Varma, V. (2017). Deep Learning for Hate Speech Detection in Tweets. Proceedings of the 26th International Conference on World Wide Web Companion, 759–760. https://doi.org/10.1145/3041021.3054223
Bagora, A., Shrestha, K., Maurya, K., & Desarkar, M. S. (2022). Hostility Detection in Online Hindi-English Code-Mixed Conversations. Proceedings of the 14th ACM Web Science Conference 2022, 390–400. https://doi.org/10.1145/3501247.3531579
Castaño-Pulgarín, S. A., Suárez-Betancur, N., Vega, L. M. T., & López, H. M. H. (2021). Internet, social media and online hate speech. Systematic review. Aggression and Violent Behavior, 58, 101608. https://doi.org/10.1016/j.avb.2021.101608
Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W., & Starnini, M. (2021). The echo chamber effect on social media. Proceedings of the National Academy of Sciences of the United States of America, 118(9), e2023301118. https://doi.org/10.1073/pnas.2023301118
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language (No. arXiv:1703.04009). arXiv. https://doi.org/10.48550/arXiv.1703.04009
Demasi, M. A. (2021). Homing in on HATE: Critical Discourse Studies of Hate Speech, Discrimination and Inequality in the Digital Age Edited by G. Balirano and B. Hughes (2020). Journal of Language and Discrimination, 5(2), 226–229. https://doi.org/10.1558/jld.20501
Fortuna, P., & Nunes, S. (2018). A Survey on Automatic Detection of Hate Speech in Text. ACM Comput. Surv., 51(4), 85:1-85:30. https://doi.org/10.1145/3232676
Fuchs, C. (2021). Social media: A critical introduction (Third edition). SAGE.
Guess, A., Nagler, J., & Tucker, J. (2019). Less than you think: Prevalence and predictors of fake news dissemination on Facebook. Science Advances, 5(1), eaau4586. https://doi.org/10.1126/sciadv.aau4586
Guillén-Nieto, V. (2023). 4 Critical discourse analysis. In Hate Speech (pp. 59–84). De Gruyter Mouton. https://www.degruyterbrill.com/document/doi/10.1515/9783110672619-004/html
He, B., Ziems, C., Soni, S., Ramakrishnan, N., Yang, D., & Kumar, S. (2021). Racism is a virus: Anti-asian hate and counterspeech in social media during the COVID-19 crisis. Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 90–94. https://doi.org/10.1145/3487351.3488324
Matamoros Fernandez, A. (2017). Platformed racism: The mediation and circulation of an Australian race-based controversy on Twitter, Facebook and YouTube. Information, Communication and Society, 20(6), Article 6.
Mathew, B., Saha, P., Yimam, S. M., Biemann, C., Goyal, P., & Mukherjee, A. (2022). HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection (No. arXiv:2012.10289). arXiv. https://doi.org/10.48550/arXiv.2012.10289
Mozafari, M., Farahbakhsh, R., & Crespi, N. (2020). Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model. PLOS ONE, 15(8), e0237861. https://doi.org/10.1371/journal.pone.0237861
Nelatoori, K. B., & Kommanti, H. B. (2023). Multi-task learning for toxic comment classification and rationale extraction. Journal of Intelligent Information Systems, 60(2), 495–519. https://doi.org/10.1007/s10844-022-00726-4
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive Language Detection in Online User Content. Proceedings of the 25th International Conference on World Wide Web, 145–153. WWW ’16: 25th International World Wide Web Conference. https://doi.org/10.1145/2872427.2883062
Papakyriakopoulos, O., Medina Serrano, J. C., & Hegelich, S. (2020). The spread of COVID-19 conspiracy theories on social me-dia and the effect of content moderation. Harvard Kennedy School Misinformation Review. https://doi.org/10.37016/mr-2020-034
Rodríguez-Peral, E. M., Franco, T. G., & Bustos, D. R.-P. (2025). Propagation of Hate Speech on Social Network X: Trends and Approaches. Social Inclusion, 13(0). https://doi.org/10.17645/si.9317
Schmidt, A., & Wiegand, M. (2017). A Survey on Hate Speech Detection using Natural Language Processing. In L.-W. Ku & C.-T. Li (Eds.), Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (pp. 1–10). Association for Computational Linguistics. https://doi.org/10.18653/v1/W17-1101
Shelton, C. R., Kotsiou, A., & Hetzel-Riggin, M. D. (2021). Digital mental health interventions: Impact and considerations. In Human factors issues and the impact of technology on society (pp. 180–210). Information Science Reference/IGI Global. https://doi.org/10.4018/978-1-7998-6453-0.ch008
Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake News Detection on Social Media: A Data Mining Perspective (No. arXiv:1708.01967). arXiv. https://doi.org/10.48550/arXiv.1708.01967
Vidgen, B., Harris, A., Nguyen, D., Tromble, R., Hale, S., & Margetts, H. (2019). Challenges and frontiers in abusive content detection. In S. T. Roberts, J. Tetreault, V. Prabhakaran, & Z. Waseem (Eds.), Proceedings of the Third Workshop on Abusive Language Online (pp. 80–93). Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-3509
Waseem, Z., Davidson, T., Warmsley, D., & Weber, I. (2017). Understanding Abuse: A Typology of Abusive Language Detection Subtasks. In Z. Waseem, W. H. K. Chung, D. Hovy, & J. Tetreault (Eds.), Proceedings of the First Workshop on Abusive Language Online (pp. 78–84). Association for Computational Linguistics. https://doi.org/10.18653/v1/W17-3012
Wiegand, M., Ruppenhofer, J., & Eder, E. (2021). Implicitly Abusive Language – What does it actually look like and why are we not getting there? In K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, & Y. Zhou (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 576–587). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.48
Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019). Predicting the Type and Target of Offensive Posts in Social Media. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 1415–1420). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1144
Zhang, Z., & Luo, L. (2018). Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter (No. arXiv:1803.03662). arXiv. https://doi.org/10.48550/arXiv.1803.03662
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Arie Purwa Kusuma, Nurina Kurniasari Rahmawati

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.