×
Home Current Archive Editorial board
Instructions for papers
For Authors Aim & Scope Contact
Original scientific article

GRAPH-BASED METHODS FOR EXTRACTIVE ARABIC NEWS TEXT SUMMARIZATION

By
Rasha Almutairi Orcid logo ,
Rasha Almutairi
Contact Rasha Almutairi

Department of Information Systems, King Abdulaziz University , Jeddah , Saudi Arabia

Sahar Jambi Orcid logo ,
Sahar Jambi

Assistant Professor, Department of Information Systems, King Abdulaziz University , Jeddah , Saudi Arabia

Tawfiq Hasanin Orcid logo
Tawfiq Hasanin

Associate Professor, Department of Information Systems, King Abdulaziz University , Jeddah , Saudi Arabia

Abstract

The speed of the increasing digital content requires the creation of successful Automatic Text Summarization (ATS) systems. Although major improvements have been made in the summarization of high-resource languages, the summarization of Arabic texts has not been effectively studied, especially in terms of comparative studies of preprocessing methods of documents and word-embedding algorithms. This paper explores the effects of some of the most important variables on the work of graph-based extractive summarization of Arabic news articles, namely, preprocessing methods, word embeddings, ranking methods, and compression ratios. There were experiments using the Essex Arabic Summary Corpus (EASC) with four preprocessing methods (Khoja, Farasa, Qalsadi, and Stanza), two word embedding models (GloVe and AraBERT), two ranking algorithms (PageRank and HITS), and two compression ratios (30% and 40%). The quality of summarizing was measured by the ROUGE-1 F- score. The findings indicated a significant difference (p < 0.001) in all factors, and GloVe performs better than AraBERT (average ROUGE-1 F-score of 0.389 vs. 0.36), and a higher compression ratio (40% more) achieved better performance. To be more precise, such preprocessing techniques as Khoja and Farasa yielded the same ROUGE-1 F-scores of 0.381 and 0.379, respectively, and Stanza gave much lower ones (0.364). It was statistically significant that there have been interactions between preprocessing model and word embedding model, ranking algorithm and compression ratio. Future research will offer more extensive guidelines on how to choose the best preprocessing and representation strategies to use with Arabic ATS systems by including larger and more varied datasets, as well as human evaluation methods to offer a wider range of evaluation. More studies will also be done on the fusion of the supervised summarization technique and deep learning-based systems and multilingual summarization systems.

References

1.
El-Kassas WS, Salama CR, Rafea AA, Mohamed HK. Automatic text summarization: A comprehensive survey. Expert systems with applications. 2021 Mar 1; 165:113679.
2.
Sharma KP, Yajid MS, Gowrishankar J, Mahajan R, Alsoud AR, Jadhav A, Singh D. A systematic review on text summarization: techniques, challenges, opportunities. Expert Systems. 2025 Apr;42(4): e13833.
3.
Watanangura P, Vanichrudee S, Minteer O, Sringamdee T, Thanngam N, Siriborvornratanakul T. A comparative survey of text summarization techniques. SN Computer Science. 2023 Dec 2;5(1):47.
4.
Abdelqader KJ, Mohamed A, Shaalan K. Systematic review of automatic Arabic text summarization techniques. InInternational conference on Variability of the Sun and sun-like stars: from asteroseismology to space weather 2023 (pp. 783-796). Springer, Singapore.
5.
Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O. Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling. Expert Systems with Applications. 2021 Jun 15; 172:114652.

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.