CROSS-PLATFORM HATE SPEECH DETECTION BY TARGET CATEGORY: EVALUATING TRADITIONAL AND TRANSFORMER MODELS ENHANCED WITH SMOTE

Rachna Narula; Poonam Chaudhary

doi:10.70102/afts.2025.1833.787

Original scientific article

Published: December 2025

<< Prev | Next >>

PDF

https://doi.org/10.70102/afts.2025.1833.787

CROSS-PLATFORM HATE SPEECH DETECTION BY TARGET CATEGORY: EVALUATING TRADITIONAL AND TRANSFORMER MODELS ENHANCED WITH SMOTE

Abstract

Societal and technological challenges are significant when it comes to cyberbullying, which is an ubiquitous issue in the social media space including Twitter, Facebook, YouTube, and Instagram. This paper will focus on the identification of hate speech that targets particular individuals, especially in the context of the data in Hindi language. It tries to fill the gap between the overall abusive language and the hatred directed at specific people or populations. To this end, an annotated and carefully edited compilation was created where hate speech was divided into the following categories: racial/ethnic, religious, sexual orientation, and political. In order to address the problem of the imbalance of the classes, both Synthetic Minority Over-sampling Technique (SMOTE) and Grouped SMOTE were applied to the model to enhance its efficiency. The traditional machine learning frameworks (Support Vector Machines (SVM), Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), and the transformer-based model Bidirectional Encoder Representations from Transformers (BERT) have been explored and the results show that BERT outperforms the traditional models (the F1-score of BERT is 89.2), which supports the initial hypothesis. The second hypothesis was also supported by the use of SMOTE which increased accuracy and precision. It was found that there was a strong correlation between the frequency of hate speech and the different demographic attributes particularly with regard to racial and political biases which support the third hypothesis. Also, the document underlines the fact that goal-based classification is a better approach to binary classification models, thus validating the fourth hypothesis. Further analysis reveals a difference in hate speech trends on different platforms: on Twitter, politically charged hate is quite widespread, and on Facebook, hate speech is mostly based on religious topics. The outlined findings emphasise the importance of creating detection systems of hate speech that are platform-adaptable, demographically aware, and that are also built upon sentiment analysis. This study contributes to the field of context-sensitive content control and enhances fairness of hate speech detection through natural languages processing tools and is of great interests to the researcher of artificial intelligence, decision-makers, and social networks.

Keywords:

hate speech detection,

target-based analysis,

social media,

machine learning,

dataset curation,

statistical analysis.

References

Fernandez A, Garcia S, Herrera F, Chawla NV. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research. 2018 Apr 20;61:863-905.

Rogers A, Kovaleva O, Rumshisky A. A primer in BERTology: What we know about how BERT works. Transactions of the association for computational linguistics. 2020;8:842-66.

Kurbanazarova N, Shavkidinova D, Khaydarov M, Mukhitdinova N, Khudoymurodova K, Toshniyozova D, et al. Development of speech recognition in wireless mobile networks for an intelligent learning system in language education. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications. 2024;15(3):298-311.

Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning. Electronic markets. 2021 Sep;31(3):685-95.

Chang CY, Lee SJ, Lai CC. Weighted word2vec based on the distance of words. In2017 International Conference on Machine Learning and Cybernetics (ICMLC) 2017 Jul 9 (Vol. 2, pp. 563-568).

Taye MM, Abulail R, Al-Ifan B, Alsuhimat F. Enhanced Sentiment Classification through Ontology-Based Sentiment Analysis with BERT. Journal of Internet Services and Information Security. 2025;15(1):236-56.

Elreedy D, Atiya AF. Decoding word embeddings with brain-based semantic features. Computational Linguistics. 2021;47(3):663-98. https://dx.doi.org/10.1162/COLI_a_00412.

Chersoni E, Santus E, Huang CR, Lenci A. Decoding word embeddings with brain-based semantic features. Computational Linguistics. 2021;47(3):663-98.

Mahajan E, Mahajan H, Kumar S. EnsMulHateCyb: Multilingual hate speech and cyberbully detection in online social media. Expert systems with applications. 2024 Feb 1;236:121228.

10.

Palanivelu R, Alabdeli HM, Srujan Raju K, Kishore D, Balamurugan R, Abdikadirovich SS. Deep learning for hate speech detection. In International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021) 2022 Apr 22 (Vol. 12163, pp. 1079-1084). SPIE.

11.

ZHANG E. Deep learning for hate speech detection. In International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021) 2022 Apr 22 (Vol. 12163, pp. 1079-1084). SPIE.

12.

Heimerl F, Gleicher M. Interactive analysis of word vector embeddings. InComputer Graphics Forum 2018 Jun (Vol. 37, No. 3, pp. 253-265).

13.

Nirosha G, Dr Velmani RD. Raspberry Pi based Sign to speech conversion system for mute community. InIOP Conference Series: Materials Science and Engineering 2020 Dec 1 (Vol. 981, No. 4, p. 042005). IOP Publishing.

14.

Iqbal F, Hashmi JM, Fung BC, Batool R, Khattak AM, Aleem S, et al. A hybrid framework for sentiment analysis using genetic algorithm based feature reduction. IEEE access. 2019 Jan 21;7:14637-52.

15.

Plaza-Del-Arco FM, Molina-Gonzalez MD, Urena-Lopez LA, Martin-Valdivia MT. A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access. 2021 Aug 9;9:112478-89.

16.

Sakketou F, Ampazis N. A constrained optimization algorithm for learning GloVe embeddings with semantic lexicons. Knowledge-Based Systems. 2020 May 11;195:105628.

17.

Jawahar G, Sagot B, Seddah D. What does BERT learn about the structure of language?. InACL 2019-57th Annual Meeting of the Association for Computational Linguistics 2019 Jul 28.

18.

Sak H, Senior AW, Beaufays F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. InInterspeech 2014 Sep 14 (Vol. 2014, pp. 338-342).

19.

Cheng K, Zhang C, Yu H, Yang X, Zou H, Gao S. Grouped SMOTE with noise filtering mechanism for classifying imbalanced data. IEEE Access. 2019 Nov 22;7:170668-81.

20.

Mäntylä MV, Graziotin D, Kuutila M. The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Computer Science Review. 2018 Feb 1;27:16-32.

21.

Ali MZ, Rauf S, Javed K, Hussain S. Improving hate speech detection of Urdu tweets using sentiment analysis. IEEE Access. 2021 Jun 9;9:84296-305.

22.

Badjatiya P, Gupta S, Gupta M, Varma V. Deep learning for hate speech detection in tweets. InProceedings of the 26th international conference on World Wide Web companion 2017 Apr 3 (pp. 759-760). 1AD;

23.

Lauren P, Qu GB, Yang J, Watta P, Huang G, Lendasse A. Generating word embeddings from an extreme learning machine for sentiment analysis and sequence labeling tasks. Cognitive Computation. 2018 Aug;10(4):625-38.

24.

Dipietro R, Hager GD. Deep learning: RNNs and LSTM. InHandbook of medical image computing and computer assisted intervention 2020 Jan 1 (pp. 503-519). Academic Press.

25.

Mutanga RT, Olugbara O, Naicker N. Bibliometric analysis of deep learning for social media hate speech detection. Journal of Information Systems and Informatics. 2023;5(3):1154-76.

26.

Yamashita R, Nishio M, Do RK, Togashi K. Convolutional neural networks: an overview and application in radiology. Insights into imaging. 2018 Aug;9(4):611-29.

27.

Rani S, Kumar P. Deep learning based sentiment analysis using convolution neural network. Arabian Journal for Science and Engineering. 2019 Apr 1;44(4):3305-14.

28.

Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal. 2014 Dec 1;5(4):1093-113.

29.

Zhou X, Yong Y, Fan X, Ren G, Song Y, Diao Y, et al. Hate speech detection based on sentiment knowledge sharing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 2021 Aug (pp. 7158-7166).

Citation

Copyright

This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics

Google scholar: See link

Issue 33, 2025

A NOVEL FRAMEWORK FOR ENHANCING DATA COLLECTION MACRO- STRATEGIES IN HETEROGENEOUS IOT NETWORKS USING ADVANCED MATHEMATICAL MODELING GA-PSO-MIN: A HYBRID HEURISTIC ALGORITHM FOR MULTI-OBJECTIVE JOB SCHEDULING IN CLOUD COMPUTING HOMOGENEITY URBAN CELLULAR AUTOMATA MODEL – FROM REGENERATIVE TO SUSTAINABLE CITIES IOT POWERED SMART CRADLE FOR INFANT CARE AND VACCINATION MONITORING SYSTEM ENVIRONMENTAL ANALYSIS OF A LOW-COST SOLAR STOVE USING RECYCLED MATERIALS: A CLEAN ENERGY INNOVATION FOR HOT ARID REGIONS See full issue

About us

Editorial policy

CROSS-PLATFORM HATE SPEECH DETECTION BY TARGET CATEGORY: EVALUATING TRADITIONAL AND TRANSFORMER MODELS ENHANCED WITH SMOTE

Abstract

Keywords:

References

Citation

Copyright

Article metrics

Issue 33, 2025

Citations

Disclaimer