,
Bukhara State Medical Institute named after Abu Ali ibn Sino , Bukhara , Uzbekistan
,
Department of Uzbek Language and Literature with Russian Language, Samarkand State Medical University , Samarkand , Uzbekistan
,
Senior Teacher, Tashkent University of Information Technologies Named After Muhammad Al- Khwarizmi , Tashkent , Uzbekistan
,
Department of English Language and Literature, Termez University of Economics and Service , Termez , Uzbekistan
,
Lecturer, Jizzakh State Pedagogical University , Jizzakh , Uzbekistan
Associate Professor, Vice-Rector for Science and Innovations, Uzbekistan State University of World Languages , Tashkent , Uzbekistan
Manual comparative methods have long been the main source for reconstructing Proto-Indo-European (PIE) dialects, with their weaknesses including fragmentary corpora, interpretive bias, and a lack of direct textual evidence. This paper introduces a probabilistic semantic reconstruction model that combines computational comparative linguistics and deep neural archiving to learn and reconstruct the dialectal variations that have been lost in PIE. A multilingual dataset of 12 Indo-European language branches and 18,742 cognate sets, with phonological, morphological, and semantic feature embeddings, was compiled and entered. An inverse phylogenetic inference model based on Bayesian inference and a transformer-based deep neural network trained on 4.6 million aligned lexical tokens was used to predict proto-forms and semantic shifts. When tested against known scholarly reconstructions, the proposed model achieved 86.3% accuracy in phonological reconstruction and 0.81 semantic consistency (cosine similarity metric). Cross-validation indicated a 14.7% decrease in reconstruction variance compared to traditional rule-based methods. Probabilistic confidence intervals (95% CI) also showed consistent predictions for high-frequency lexical roots, with posterior probabilities greater than 0.90 for the reconstructed forms (63%). Moreover, statistically significant divergence patterns (p < 0.01) were observed in the dialectal clustering analysis and were consistent with established Indo-European subgroup stratifications. The results show that probabilistic modelling with deep neural semantic archiving can significantly improve the reliability and interpretability of reconstruction. This framework offers a computational approach to historical linguistics that can be scaled and replicated. Also, it provides a new quantitative understanding of the evolution of proto-languages and dialect differentiation within the Indo-European family.
This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.