RULE-BASED ITERATIVE PREPROCESSING WITH DEEP SIAMESE GRU–BILSTM FOR EFFICIENT DOCUMENT STREAMING

K. Ranjit Kumar; S. Thirumaran

doi:10.70102/afts.2025.1834.833

Original scientific article

Published: December 2025

<< Prev | Next >>

PDF

https://doi.org/10.70102/afts.2025.1834.833

RULE-BASED ITERATIVE PREPROCESSING WITH DEEP SIAMESE GRU–BILSTM FOR EFFICIENT DOCUMENT STREAMING

Abstract

Efficient document streaming requires robust preprocessing and semantic modeling to handle noise, redundancy, and morphological variations in large-scale text data. Existing stemming and document processing techniques often fail to preserve contextual relevance, leading to reduced classification and retrieval performance. In a bid to overcome this drawback, this paper hypothesizes a Rule-based Pre-processing Iterative Stripping model coupled with a Deep Siamese GRU-BiLSTM model. The RPIS systematically eliminates affixes based on linguistic principles and so does the Siamese GRU -BiLSTM model that obtains the bidirectional semantic dependencies between segments of the text. Experiments conducted on benchmark datasets demonstrate that the proposed model achieves 95% training accuracy and 93% validation accuracy, outperforming traditional stemmers and standalone deep learning models. Error statistics values are also much lower, and MSE is 0.012, MAE is 0.008, and RMSE is 0.109. These findings verify that rule-based preprocessing and deep semantic learning are complementary to each other in document streaming accuracy and resilience, which makes the method appropriate to the large-scale management system of documents.

Keywords:

rule-based stemming,

iterative stripping,

siamese neural networks,

GRU–BILSTM,

document streaming,

text preprocessing,

semantic similarity.

References

Jauhar SK, Priyadarshini S, Pratap S, Paul SK. A literature review on applications of Industry 4.0 in Project Management. Operations Management Research. 2023 Dec;16(4):1858-85.

Lee U, Han A, Lee J, Lee E, Kim J, Kim H, Lim C. Prompt Aloud!: Incorporating image-generative AI into STEAM class with learning analytics using prompt data. Education and Information Technologies. 2024 Jun;29(8):9575-605.

Mustoip S, Lestari D, Purwati R. Implementation of STEAM Learning Methods to Develop Collaborative and Creative Characters of Elementary School Students. JPS: Journal of Primary School. 2024 Sep 3;1(2):13-20.

Seydali M, Khunjush F, Dogani J. Streaming traffic classification: a hybrid deep learning and big data approach. Cluster Computing. 2024 Jul;27(4):5165-93.

Fei Z, West GM, Murray P, Dobie G. CNN-based automated approach to crack-feature detection in steam cycle components. International Journal of Pressure Vessels and Piping. 2024 Feb 1;207:105112.

Citation

Copyright

This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics

Google scholar: See link

Issue 34, 2025

THE MODEL OF GREEN ENTREPRENEURSHIP FACTORS ON THE INTERNATIONALIZATION PERFORMANCE OF SMES IN CHINA: A CONCEPTUAL FRAMEWORK HORNED LIZARD-CATBOOST FRAMEWORK FOR CYBERBULLYING PREVENTION IN SOCIAL NETWORKS ON LEVERAGING GENERATIVE ARTIFICIAL INTELLIGENCE (GENAI) FOR BEHAVIOR LEARNING AND PERSONALIZED MARKETING OPTIMIZATION ENHANCING IP COMMERCIALIZATION PERFORMANCE IN SOCIAL SCIENCE ACADEMICS AND THE ROLE OF ENTREPRENEURIAL ORIENTATION, UNIVERSITY SUPPORT, AND SELF-EFFICACY DETERMINANTS OF EMPLOYEE ENGAGEMENT IN ORGANIZED RETAIL: AN ANALYTICAL STUDY See full issue

About us

Editorial policy

RULE-BASED ITERATIVE PREPROCESSING WITH DEEP SIAMESE GRU–BILSTM FOR EFFICIENT DOCUMENT STREAMING

Abstract

Keywords:

References

Citation

Copyright

Article metrics

Issue 34, 2025

Citations

Disclaimer