GRAPH-BASED METHODS FOR EXTRACTIVE ARABIC NEWS TEXT SUMMARIZATION

Rasha Almutairi; Sahar Jambi; Tawfiq Hasanin

doi:10.70102/afts.2026.1835.087

Original scientific article

Published: May 2026

<< Prev | Next >>

PDF

https://doi.org/10.70102/afts.2026.1835.087

GRAPH-BASED METHODS FOR EXTRACTIVE ARABIC NEWS TEXT SUMMARIZATION

Abstract

The speed of the increasing digital content requires the creation of successful Automatic Text Summarization (ATS) systems. Although major improvements have been made in the summarization of high-resource languages, the summarization of Arabic texts has not been effectively studied, especially in terms of comparative studies of preprocessing methods of documents and word-embedding algorithms. This paper explores the effects of some of the most important variables on the work of graph-based extractive summarization of Arabic news articles, namely, preprocessing methods, word embeddings, ranking methods, and compression ratios. There were experiments using the Essex Arabic Summary Corpus (EASC) with four preprocessing methods (Khoja, Farasa, Qalsadi, and Stanza), two word embedding models (GloVe and AraBERT), two ranking algorithms (PageRank and HITS), and two compression ratios (30% and 40%). The quality of summarizing was measured by the ROUGE-1 F- score. The findings indicated a significant difference (p < 0.001) in all factors, and GloVe performs better than AraBERT (average ROUGE-1 F-score of 0.389 vs. 0.36), and a higher compression ratio (40% more) achieved better performance. To be more precise, such preprocessing techniques as Khoja and Farasa yielded the same ROUGE-1 F-scores of 0.381 and 0.379, respectively, and Stanza gave much lower ones (0.364). It was statistically significant that there have been interactions between preprocessing model and word embedding model, ranking algorithm and compression ratio. Future research will offer more extensive guidelines on how to choose the best preprocessing and representation strategies to use with Arabic ATS systems by including larger and more varied datasets, as well as human evaluation methods to offer a wider range of evaluation. More studies will also be done on the fusion of the supervised summarization technique and deep learning-based systems and multilingual summarization systems.

Keywords:

extractive text summarization,

arabic text summarization,

automatic text summarization (ATS),

graph-based,

stemming,

lemmatization,

word embeddings (WE).

References

El-Kassas WS, Salama CR, Rafea AA, Mohamed HK. Automatic text summarization: A comprehensive survey. Expert systems with applications. 2021 Mar 1; 165:113679.

Sharma KP, Yajid MS, Gowrishankar J, Mahajan R, Alsoud AR, Jadhav A, Singh D. A systematic review on text summarization: techniques, challenges, opportunities. Expert Systems. 2025 Apr;42(4): e13833.

Watanangura P, Vanichrudee S, Minteer O, Sringamdee T, Thanngam N, Siriborvornratanakul T. A comparative survey of text summarization techniques. SN Computer Science. 2023 Dec 2;5(1):47.

Abdelqader KJ, Mohamed A, Shaalan K. Systematic review of automatic Arabic text summarization techniques. InInternational conference on Variability of the Sun and sun-like stars: from asteroseismology to space weather 2023 (pp. 783-796). Springer, Singapore.

Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O. Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling. Expert Systems with Applications. 2021 Jun 15; 172:114652.

Citation

Copyright

This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics

Google scholar: See link

Issue 35, 2026

PEDAGOGICAL IMPACT OF MULTI SENSORY VIRTUAL REALITY SIMULATIONS ON HISTORICAL PERSPECTIVE TAKING AND SPATIAL UNDERSTANDING IN SECONDARY EDUCATION SYMPHONY OF THE SEAS: UNVEILING METAGENOMIC BIODIVERSITY THROUGH 16S rRNA IN THE RED SEA RICE LEAF DISEASE DIAGNOSIS THROUGH DEEP LEARNING: AN INCEPTIONV3 APPROACH WITH SPATIAL ATTENTION FOR SUSTAINABLE AGRICULTURE AND FOOD SECURITY PROBABILISTIC SEMANTIC RECONSTRUCTION OF LOST PROTO INDO-EUROPEAN DIALECTS USING COMPUTATIONAL COMPARATIVE LINGUISTIC MODELING AND DEEP NEURAL ARCHIVING CONSCIOUS CONSUMERISM CAUSE RELATED MARKETING IN GEN Z AWARENESS AND ACTION See full issue

About us

Editorial policy

GRAPH-BASED METHODS FOR EXTRACTIVE ARABIC NEWS TEXT SUMMARIZATION

Abstract

Keywords:

References

Citation

Copyright

Article metrics

Issue 35, 2026

Citations

Disclaimer