Societal and technological challenges are significant when it comes to cyberbullying, which is an ubiquitous issue in the social media space including Twitter, Facebook, YouTube, and Instagram. This paper will focus on the identification of hate speech that targets particular individuals, especially in the context of the data in Hindi language. It tries to fill the gap between the overall abusive language and the hatred directed at specific people or populations. To this end, an annotated and carefully edited compilation was created where hate speech was divided into the following categories: racial/ethnic, religious, sexual orientation, and political. In order to address the problem of the imbalance of the classes, both Synthetic Minority Over-sampling Technique (SMOTE) and Grouped SMOTE were applied to the model to enhance its efficiency. The traditional machine learning frameworks (Support Vector Machines (SVM), Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), and the transformer-based model Bidirectional Encoder Representations from Transformers (BERT) have been explored and the results show that BERT outperforms the traditional models (the F1-score of BERT is 89.2), which supports the initial hypothesis. The second hypothesis was also supported by the use of SMOTE which increased accuracy and precision. It was found that there was a strong correlation between the frequency of hate speech and the different demographic attributes particularly with regard to racial and political biases which support the third hypothesis. Also, the document underlines the fact that goal-based classification is a better approach to binary classification models, thus validating the fourth hypothesis. Further analysis reveals a difference in hate speech trends on different platforms: on Twitter, politically charged hate is quite widespread, and on Facebook, hate speech is mostly based on religious topics. The outlined findings emphasise the importance of creating detection systems of hate speech that are platform-adaptable, demographically aware, and that are also built upon sentiment analysis. This study contributes to the field of context-sensitive content control and enhances fairness of hate speech detection through natural languages processing tools and is of great interests to the researcher of artificial intelligence, decision-makers, and social networks.
Fernandez A, Garcia S, Herrera F, Chawla NV. SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research. 2018;61:863–905.
2.
Rogers A, Kovaleva O, Rumshisky A. A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics. 2020;8:842–66.
3.
Kurbanazarova N, Shavkidinova D, Khaydarov M, Mukhitdinova N, Khudoymurodova K, Toshniyozova D, et al. Development of Speech Recognition in Wireless Mobile Networks for An Intelligent Learning System in Language Education. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications. 2024;15(3):298–311.
4.
Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning. Electronic markets. 2021;(3):685–95.
5.
Chang CY, Lee SJ, Lai CC. Weighted word2vec based on the distance of words. 2017 International Conference on Machine Learning and Cybernetics (ICMLC). IEEE; 2017. p. 563–8.
6.
Taye MM, Abulail R, Al-Ifan B, Alsuhimat F. Enhanced Sentiment Classification through Ontology-Based Sentiment Analysis with BERT. Journal of Internet Services and Information Security. 2025;15(1):236–56.
7.
Elreedy D, Atiya AF. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences. 2019;505:32–64.
8.
Chersoni E, Santus E, Huang CR, Lenci A. Decoding Word Embeddings with Brain-Based Semantic Features. Computational Linguistics. 2021;47(3):663–98.
9.
Mahajan E, Mahajan H, Kumar S. EnsMulHateCyb: Multilingual hate speech and cyberbully detection in online social media. Expert Systems with Applications. 2024;236:121228.
10.
Palanivelu R, Alabdeli HM, Srujan Raju K, Kishore D, Balamurugan R, Abdikadirovich SS. Evolution of Vector-Based Retrieval in Digital Humanities Archives. Indian Journal of Information Sources and Services. 2025;15(3):301–11.
11.
ZHANG E. Deep learning for hate speech detection. International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021). SPIE; 2022. p. 136.
12.
Heimerl F, Gleicher M. Interactive Analysis of Word Vector Embeddings. Computer Graphics Forum. 2018;37(3):253–65.
13.
Nirosha G, Dr Velmani R. Raspberry Pi based Sign to Speech Conversion System for Mute Community. IOP Conference Series: Materials Science and Engineering. 2020;981(4):042005.
14.
Iqbal F, Hashmi JM, Fung BCM, Batool R, Khattak AM, Aleem S, et al. A Hybrid Framework for Sentiment Analysis Using Genetic Algorithm Based Feature Reduction. IEEE Access. 2019;7:14637–52.
15.
Plaza-Del-Arco FM, Molina-Gonzalez MD, Urena-Lopez LA, Martin-Valdivia MT. A Multi-Task Learning Approach to Hate Speech Detection Leveraging Sentiment Analysis. IEEE Access. 2021;9:112478–89.
16.
Sakketou F, Ampazis N. A constrained optimization algorithm for learning GloVe embeddings with semantic lexicons. Knowledge-Based Systems. 2020;195:105628.
17.
Jawahar G, Sagot B, Seddah D. What does BERT learn about the structure of language. 2019;
18.
Sak H, Senior A, Beaufays F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. 2014;338–42.
19.
Cheng K, Zhang C, Yu H, Yang X, Zou H, Gao S. Grouped SMOTE With Noise Filtering Mechanism for Classifying Imbalanced Data. IEEE Access. 2019;7:170668–81.
20.
Mäntylä MV, Graziotin D, Kuutila M. The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Computer Science Review. 2018;27:16–32.
21.
Ali MZ, Ehsan-Ul-Haq, Rauf S, Javed K, Hussain S. Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis. IEEE Access. 2021;9:84296–305.
22.
Badjatiya P, Gupta S, Gupta M, Varma V. Deep Learning for Hate Speech Detection in Tweets. Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17 Companion. ACM Press; 2017. p. 759–60.
23.
Lauren P, Qu G, Yang J, Watta P, Huang G, Lendasse A. Generating word embeddings from an extreme learning machine for sentiment analysis and sequence labeling tasks. Cognitive Computation. 2018;(4):625–38.
24.
Dipietro R, Hager G. Deep learning: RNNs and LSTM. InHandbook of medical image computing and computer assisted intervention. 2020;503–19.
25.
Mutanga RT, Olugbara O, Naicker N. Bibliometric Analysis of Deep Learning for Social Media Hate Speech Detection. Journal of Information Systems and Informatics. 2023;5(3):1154–76.
26.
Yamashita R, Nishio M, Do R, Togashi K. Convolutional neural networks: an overview and application in radiology. Insights into imaging. 2018;(4):611–29.
27.
Rani S, Kumar P. Deep learning based sentiment analysis using convolution neural network. Arabian Journal for Science and Engineering. 2019;(4):3305–14.
28.
Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal. 2014;5(4):1093–113.
29.
Zhou X, Yong Y, Fan X, Ren G, Song Y, Diao Y, et al. Hate Speech Detection Based on Sentiment Knowledge Sharing. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics; 2021. p. 7158–66.
The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.