×
Home Current Archive Editorial board
Instructions for papers
For Authors Aim & Scope Contact
Original scientific article

CONTEXT-AWARE RULE-BASED MATH EXPRESSION NORMALISER AND VERBALIZER USING LATEX2TEXT FOR ENHANCED DOCUMENT PREPROCESSING

By
J. Joice Orcid logo ,
J. Joice

Research Scholar, PG and Research Department of Computer Science, Government Arts and Science College , Tiruppur , India

C. Sathya Orcid logo
C. Sathya

Assistant Professor, PG and Research Department of Computer Science, Government Arts and Science College , Tiruppur , India

Abstract

Blind students usually are subjected to a substantial impediment of reading and accessing electronic documents, especially data that are noisy and those that are carefully designed. Traditional NLP models severely underestimate or misinterpret mathematical expressions in which symbols are represented as notation. It is a critical problem in the educational field, accessibility, and report generation programs, where in-depth knowledge of mathematical content is a priority. State-of-the-art document summarisation systems tend to fail in noisy text, disordered document structures, and non-textual content, e.g., equations, images, and charts. This paper introduces a powerful preprocessing model that focuses on improving input quality, semantic coherence, and readability. The process consists of sophisticated text cleaning, discerning structuring, and an extensive content interpretation model. The paper presents a proposal to simplify and verbalise mathematical expressions using a rule-based, context-sensitive language called the Verbalizer Rule (VR). The system translates complex mathematical syntax into human-readable natural-language descriptions by pattern-matching expressions and translating semantic meaning using clues in the context. Experiments demonstrate that this method achieves much higher readability scores and summarisation quality than state-of-the-art models. In the assessment, the Proposed CARMEN model, using the ROUGE metrics 1, 2, and L, yields a ROUGE score of more than 0.8333 among the other verbalizers.

References

1.
Mukhiddinov M, Kim SY. A systematic literature review on the automatic creation of tactile graphics for the  blind and visually impaired. Processes. 2021 Sep 26;9(10):1726.
2.
Aguinis H, Gottfredson RK, Joo H. Best-practice recommendations for defining, identifying, and handling  outliers. Organizational research methods. 2013 Apr;16(2):270-301.
3.
Rahimi I, Gandomi AH, Chen F, Mezura-Montes E. A review on constraint handling techniques for  population-based algorithms: from single-objective to multi-objective optimization. Archives of  Computational Methods in Engineering. 2023 Apr;30(3):2181-209.
4.
Wang S, Cheah JH, Wong CY, Ramayah T. Progress in partial least squares structural equation modeling  use in logistics and supply chain management in the last decade: a structured literature review. International  Journal of Physical Distribution & Logistics Management. 2024 Oct 17;54(7/8):673-704.
5.
Psomas E. Future research methodologies of lean manufacturing: a systematic literature review. International  Journal of Lean Six Sigma. 2021 Nov 19;12(6):1146-83.
6.
Zhu X, Zhang G, Sun B. A comprehensive literature review of the demand forecasting methods of emergency  resources from the perspective of artificial intelligence. Natural Hazards. 2019 May 15;97(1):65-82.
7.
Fiandrino S, Tonelli A, Devalle A. Sustainability materiality research: a systematic literature review of  methods, theories and academic themes. Qualitative Research in Accounting & Management. 2022 Oct  21;19(5):665-95.
8.
Liguda C, Pfeiffer T. Modeling math word problems with augmented semantic networks. In International  Conference on Application of Natural Language to Information Systems 2012 Jun 26 (pp. 247-252). Berlin,  Heidelberg: Springer Berlin Heidelberg.
9.
Zhou Q, Huang D. Towards generating math word problems from equations and topics. In Proceedings of  the 12th international conference on natural language generation 2019 (pp. 494-503).
10.
Chai CP. Comparison of text preprocessing methods. Natural language engineering. 2023 May;29(3):509 53.
11.
Amato A, Di Lecce V. Data preprocessing impact on machine learning algorithm performance. Open  computer science. 2023 Jul 17;13(1):20220278.
12.
Helin R, Indahl UG, Tomic O, Liland KH. On the possible benefits of deep learning for spectral  preprocessing. Journal of Chemometrics. 2022 Feb;36(2): e3374.
13.
Feichter C, Schlippe T. Investigating models for the transcription of mathematical formulas in images.  Applied Sciences. 2024 Jan 29;14(3):1140.
14.
Peyrard M, Eckle-Kohler J. Optimizing an approximation of rouge-a problem-reduction approach to  extractive multi-document summarization. In Proceedings of the 54th Annual Meeting of the Association for  Computational Linguistics (Volume 1: Long Papers) 2016 Aug (pp. 1825-1836).
15.
Lee D, Shin MC, Whang T, Cho S, Ko B, Lee D, Kim E, Jo J. Reference and document aware semantic  evaluation methods for Korean language summarization. InProceedings of the 28th International Conference  on Computational Linguistics 2020 Dec (pp. 5604-5616).
16.
Jain R, Mavi V, Jangra A, Saha S. Widar-weighted input document augmented rouge. InEuropean  Conference on Information Retrieval 2022 Apr 5 (pp. 304-321). Cham: Springer International Publishing.
17.
Ganesan K. Rouge 2.0: Updated and improved measures for evaluation of summarization tasks. arXiv  preprint arXiv:1803.01937. 2018 Mar 5.
18.
Biswas R, Bamba U, Broad N. Wiki STEM Corpus. Kaggle; 2024 Apr 12.
19.
Latif A, Kim J. Evaluation and analysis of large language models for clinical text augmentation and  generation. IEEE Access. 2024 Apr 3; 12:48987-96.
20.
Muludi K, Fitria KM, Triloka J. Retrieval-Augmented Generation Approach: Document Question Answering  using Large Language Model. International Journal of Advanced Computer Science & Applications. 2024  Mar 1;15(3).
21.
Chai Y, Xie H, Qin JS. Text data augmentation for large language models: A comprehensive survey of  methods, challenges, and opportunities. Artificial Intelligence Review. 2026 Jan;59(1):35.
22.
Wang JH, Zhuang JA. Text Summary Generation based on Data Augmentation and Contrastive Learning.  In2024 IEEE International Conference on Big Data (BigData) 2024 Dec 15 (pp. 7218-7224). IEEE.
23.
Alsultan R, Sagheer A, Hamdoun H, Alshamlan L, Alfadhli L. PEGASUS-XL with saliency-guided scoring  and long-input encoding for multi-document abstractive summarization. Scientific Reports. 2025 Jul  22;15(1):26529.
24.
Jia R, Zhang X, Cao Y, Lin Z, Wang S, Wei F. Neural label search for zero-shot multi-lingual extractive  summarization. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics  (Volume 1: Long Papers) 2022 May (pp. 561-570).
25.
Syed AA, Gaol FL, Boediman A, Matsuo T, Budiharto W. A survey of abstractive text summarization  utilising pretrained language models. InAsian Conference on Intelligent Information and Database Systems  2022 Nov 28 (pp. 532-544). Cham: Springer International Publishing.
26.
Guo B, Gong Y, Shen Y, Han S, Huang H, Duan N, Chen W. Genius: Sketch-based language model pre training via extreme and selective masking for text generation and augmentation. arXiv preprint  arXiv:2211.10330. 2022 Nov 18.
27.
Sun D, Lyu Y, Li J, Liu X, Kara D, Lebiere C, Abdelzaher T. The Irrational LLM: Implementing Cognitive  Agents with Weighted Retrieval-Augmented Generation. In 2025 34th International Conference on  Computer Communications and Networks (ICCCN) 2025 Aug 4 (pp. 1-9). IEEE.
28.
Phang J, Zhao Y, Liu PJ. Investigating efficiently extending transformers for long input summarization.  InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023 Dec (pp.  3946-3961).
29.
Corallo G, Papotti P. Finch: Prompt-guided key-value cache compression for large language models.  Transactions of the Association for Computational Linguistics. 2024 Nov 18; 12:1517-32.
30.
Guo Z, Adedigba AP, Mallipeddi R. Cluster-Aggregated Transformer: Enhancing lightweight parameter  models. Engineering Applications of Artificial Intelligence. 2025 Nov 1; 159:111468.
31.
Asebriy Z, Raghay S, Bencharef O. An assistive technology for braille users to support mathematical  learning: a semantic retrieval system. Symmetry. 2018 Oct 26;10(11):547.
32.
Rotard M, Knödler S, Ertl T. A tactile web browser for the visually disabled. InProceedings of the sixteenth  ACM conference on Hypertext and hypermedia 2005 Sep 6 (pp. 15-22).
33.
Tian X, Wang J. Retrieval of scientific documents based on HFS and BERT. IEEE Access. 2021 Jan 5;  9:8708-17.

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.