×
Home Current Archive Editorial board
Instructions for papers
For Authors Aim & Scope Contact
Original scientific article

CONTEXT-AWARE RULE-BASED MATH EXPRESSION NORMALISER AND VERBALIZER USING LATEX2TEXT FOR ENHANCED DOCUMENT PREPROCESSING

By
J. Joice Orcid logo ,
J. Joice

Government Arts and Science College , Tiruppur , India

C. Sathya Orcid logo
C. Sathya

Government Arts and Science College , Tiruppur , India

Abstract

Blind students usually are subjected to a substantial impediment of reading and accessing electronic documents, especially data that are noisy and those that are carefully designed. Traditional NLP models severely underestimate or misinterpret mathematical expressions in which symbols are represented as notation. It is a critical problem in the educational field, accessibility, and report generation programs, where in-depth knowledge of mathematical content is a priority. State-of-the-art document summarisation systems tend to fail in noisy text, disordered document structures, and non-textual content, e.g., equations, images, and charts. This paper introduces a powerful preprocessing model that focuses on improving input quality, semantic coherence, and readability. The process consists of sophisticated text cleaning, discerning structuring, and an extensive content interpretation model. The paper presents a proposal to simplify and verbalise mathematical expressions using a rule-based, context-sensitive language called the Verbalizer Rule (VR). The system translates complex mathematical syntax into human-readable natural-language descriptions by pattern-matching expressions and translating semantic meaning using clues in the context. Experiments demonstrate that this method achieves much higher readability scores and summarisation quality than state-of-the-art models. In the assessment, the Proposed CARMEN model, using the ROUGE metrics 1, 2, and L, yields a ROUGE score of more than 0.8333 among the other verbalizers.

References

1.
Mukhiddinov M, Kim SY. A Systematic Literature Review on the Automatic Creation of Tactile Graphics for the Blind and Visually Impaired. Processes. 2021;9(10):1726.
2.
Aguinis H, Gottfredson RK, Joo H. Best-Practice Recommendations for Defining, Identifying, and Handling Outliers. Organizational Research Methods. 2013;16(2):270–301.
3.
Rahimi I, Gandomi AH, Chen F, Mezura-Montes E. A Review on Constraint Handling Techniques for Population-based Algorithms: from single-objective to multi-objective optimization. Archives of Computational Methods in Engineering. 2022;30(3):2181–209.
4.
Wang S, Cheah JH, Wong CY, Ramayah T. Progress in partial least squares structural equation modeling use in logistics and supply chain management in the last decade: a structured literature review. International Journal of Physical Distribution & Logistics Management. 2023;54(7/8):673–704.
5.
Psomas E. Future research methodologies of lean manufacturing: a systematic literature review. International Journal of Lean Six Sigma. 2021;12(6):1146–83.
6.
Zhu X, Zhang G, Sun B. A comprehensive literature review of the demand forecasting methods of emergency resources from the perspective of artificial intelligence. Natural Hazards. 2019;97(1):65–82.
7.
Fiandrino S, Tonelli A, Devalle A. Sustainability materiality research: a systematic literature review of methods, theories and academic themes. Qualitative Research in Accounting & Management. 2022;19(5):665–95.
8.
Liguda C, Pfeiffer T. Modeling Math Word Problems with Augmented Semantic Networks. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2012. p. 247–52.
9.
Zhou Q, Huang D. Towards Generating Math Word Problems from Equations and Topics. Proceedings of the 12th International Conference on Natural Language Generation. Association for Computational Linguistics; 2019. p. 494–503.
10.
Chai CP. Comparison of text preprocessing methods. Natural Language Engineering. 2022;29(3):509–53.
11.
Amato A, Di Lecce V. Data preprocessing impact on machine learning algorithm performance. Open Computer Science. 2023;13(1).
12.
Helin R, Indahl UG, Tomic O, Liland KH. On the possible benefits of deep learning for spectral preprocessing. Journal of Chemometrics. 2021;36(2).
13.
Feichter C, Schlippe T. Investigating Models for the Transcription of Mathematical Formulas in Images. Applied Sciences. 2024;14(3):1140.
14.
Peyrard M, Eckle-Kohler J. Optimizing an Approximation of ROUGE - a Problem-Reduction Approach to Extractive Multi-Document Summarization. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics; 2016. p. 1825–36.
15.
Lee D, Shin MC, Whang T, Cho S, Ko B, Lee D, et al. Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization. Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics; 2020. p. 5604–16.
16.
Jain R, Mavi V, Jangra A, Saha S. Widar-weighted input document augmented rouge. InEuropean Conference on Information Retrieval. 2022;304–21.
17.
Ganesan K. Rouge Updated and improved measures for evaluation tasks. 2018;
18.
Biswas R, Bamba U, Broad N. 2024;
19.
Latif A, Kim J. Evaluation and Analysis of Large Language Models for Clinical Text Augmentation and Generation. IEEE Access. 2024;1–1.
20.
Muludi K, Fitria KM, Triloka J, - S. Retrieval-Augmented Generation Approach: Document Question Answering using Large Language Model. International Journal of Advanced Computer Science and Applications. 2024;15(3).
21.
Chai Y, Xie H, Qin J. Text data augmentation for large language models: A comprehensive survey of methods, challenges, and opportunities. Artificial Intelligence Review. 2026;(1):35.
22.
Wang JH, Zhuang JA. Text Summary Generation based on Data Augmentation and Contrastive Learning. 2024 IEEE International Conference on Big Data (BigData). IEEE; 2024. p. 7218–24.
23.
Alsultan R, Sagheer A, Hamdoun H, Alshamlan L, Alfadhli L. PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization. Scientific Reports. 2025;(1):26529.
24.
Jia R, Zhang X, Cao Y, Lin Z, Wang S, Wei F. Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics; 2022. p. 561–70.
25.
Syed A, Gaol F, Boediman A, Matsuo T, Budiharto W. A survey of abstractive text summarization utilising pretrained language models. 2022;532–44.
26.
Guo B, Gong Y, Shen Y, Han S, Huang H, Duan N, et al. Genius: Sketch-based language model pretraining via extreme and selective masking for text generation and augmentation. 2022;
27.
Sun D, Lyu Y, Li J, Liu X, Kara D, Lebiere C, et al. The Irrational LLM: Implementing Cognitive Agents with Weighted Retrieval-Augmented Generation. 2025 34th International Conference on Computer Communications and Networks (ICCCN). IEEE; 2025. p. 1–9.
28.
Phang J, Zhao Y, Liu P. Investigating Efficiently Extending Transformers for Long Input Summarization. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2023. p. 3946–61.
29.
Corallo G, Papotti P. FINCH: Prompt-guided Key-Value Cache Compression for Large Language Models. Transactions of the Association for Computational Linguistics. 2024;12:1517–32.
30.
Guo Z, Adedigba AP, Mallipeddi R. Cluster-Aggregated Transformer: Enhancing lightweight parameter models. Engineering Applications of Artificial Intelligence. 2025;159:111468.
31.
Asebriy Z, Raghay S, Bencharef O. An Assistive Technology for Braille Users to Support Mathematical Learning: A Semantic Retrieval System. Symmetry. 2018;10(11):547.
32.
Rotard M, Knödler S, Ertl T. A tactile web browser for the visually disabled. Proceedings of the sixteenth ACM conference on Hypertext and hypermedia. ACM; 2005. p. 15–22.
33.
Tian X, Wang J. Retrieval of Scientific Documents Based on HFS and BERT. IEEE Access. 2021;9:8708–17.

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.