A ROBUST MACHINE LEARNING-BASED ENSEMBLE LEARNING FRAMEWORK FOR HATE SPEECH DETECTION IN LOW-RESOURCE SOCIAL MEDIA TEXT

Husnain Saleem; Muhammad Javed; Kiran Hanif; Asad Ullah; Muhammad Usman Ghani; Muhammad Waqas; Muhammad Ali Khan; Sheraz Ali Hassan

doi:10.70102/afts.2025.1833.747

Original scientific article

Published: December 2025

<< Prev | Next >>

PDF

https://doi.org/10.70102/afts.2025.1833.747

A ROBUST MACHINE LEARNING-BASED ENSEMBLE LEARNING FRAMEWORK FOR HATE SPEECH DETECTION IN LOW-RESOURCE SOCIAL MEDIA TEXT

Abstract

The low-resource social media text i.e., Urdu tweets containing hate speech are identified with the help of a machine learning-based ensemble approach. The dataset used for this study consisted of 8,800 tweets and half of them were labeled as Hateful and the other half as No-Hate. In preprocessing, we took into account the features of Urdu normalizing the characters, eliminating frequent words, and filtering the punctuation. TF-IDF was used to extract features based on unigrams and bigrams and the number of terms was restricted to 5,000. At first, Logistic Regression, Multinomial Naive Bayes, and Support Vector Classifier were chosen as the base learners and the Logistic Regression was used again as meta-learner in the last layer of the ensemble. The training data consisted of 80% and the rest, 20%, data was used to test the performance of models. Compared to other baseline ensemble approaches and classifiers including Random Forest, Gradient Boosting, AdaBoost, Bagging, Soft Voting, and Hard Voting, our proposedmachine learning based-stacking ensemble approach achieved a high accuracy of 86.53%, precision of 85.45%, and recall of 86.96% and F1-score of 86.20%. The research indicates that the machine learning-based stacking ensemble approach plays a vital role in the identification of hate speech in Urdu Tweets.

Keywords:

machine learning,

stacking ensemble approach,

TF-IDF,

hate speech detection,

urdu tweets.

References

Vidgen B, Yasseri T. Detecting weak and strong Islamophobic hate speech on social media. Journal of Information Technology & Politics. 2020 Jan 2;17(1):66-78.

Imomova U, Fayzullayeva D, Turdibayev D, Gulomjonova N, Kenjaev B, Shadyeva N, et al. A critical discourse analysis of linguistic framing in climate change skepticism across media and political narratives. International journal of aquatic research and environmental studies. 2025;5:121-31.

Founta A, Djouvas C, Chatzakou D, Leontiadis I, Blackburn J, Stringhini G, et al. Large scale crowdsourcing and characterization of twitter abusive behavior. In Proceedings of the international AAAI conference on web and social media 2018 Jun 15 (Vol. 12, No. 1).

Nayak P, Mathur D. Evaluating the impact of social media algorithms on information dissemination. International Academic Journal of Innovative Research. 2021;8(2):21–4.

Khan L, Amjad A, Ashraf N, Chang HT, Gelbukh A. Urdu sentiment analysis with deep learning methods. IEEE access. 2021 Jun 28;9:97803-12.

Citation

Copyright

This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics

Google scholar: See link

Issue 33, 2025

A NOVEL FRAMEWORK FOR ENHANCING DATA COLLECTION MACRO- STRATEGIES IN HETEROGENEOUS IOT NETWORKS USING ADVANCED MATHEMATICAL MODELING GA-PSO-MIN: A HYBRID HEURISTIC ALGORITHM FOR MULTI-OBJECTIVE JOB SCHEDULING IN CLOUD COMPUTING HOMOGENEITY URBAN CELLULAR AUTOMATA MODEL – FROM REGENERATIVE TO SUSTAINABLE CITIES IOT POWERED SMART CRADLE FOR INFANT CARE AND VACCINATION MONITORING SYSTEM ENVIRONMENTAL ANALYSIS OF A LOW-COST SOLAR STOVE USING RECYCLED MATERIALS: A CLEAN ENERGY INNOVATION FOR HOT ARID REGIONS See full issue

About us

Editorial policy

A ROBUST MACHINE LEARNING-BASED ENSEMBLE LEARNING FRAMEWORK FOR HATE SPEECH DETECTION IN LOW-RESOURCE SOCIAL MEDIA TEXT

Abstract

Keywords:

References

Citation

Copyright

Article metrics

Issue 33, 2025

Citations

Disclaimer