,
Assistant Professor, Department of Computer and Information Science, Annamalai University , Annamalai Nagar, Chidambaram, Tamil Nadu , India
Assistant Professor, Department of Computer Application, Alagappa Government Arts College , Karaikudi, Tamil Nadu , India
Efficient document streaming requires robust preprocessing and semantic modeling to handle noise, redundancy, and morphological variations in large-scale text data. Existing stemming and document processing techniques often fail to preserve contextual relevance, leading to reduced classification and retrieval performance. In a bid to overcome this drawback, this paper hypothesizes a Rule-based Pre-processing Iterative Stripping model coupled with a Deep Siamese GRU-BiLSTM model. The RPIS systematically eliminates affixes based on linguistic principles and so does the Siamese GRU -BiLSTM model that obtains the bidirectional semantic dependencies between segments of the text. Experiments conducted on benchmark datasets demonstrate that the proposed model achieves 95% training accuracy and 93% validation accuracy, outperforming traditional stemmers and standalone deep learning models. Error statistics values are also much lower, and MSE is 0.012, MAE is 0.008, and RMSE is 0.109. These findings verify that rule-based preprocessing and deep semantic learning are complementary to each other in document streaming accuracy and resilience, which makes the method appropriate to the large-scale management system of documents.
This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
0
The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.