×
Home Current Archive Editorial board
Instructions for papers
For Authors Aim & Scope Contact
Original scientific article

PROBABILISTIC SEMANTIC RECONSTRUCTION OF LOST PROTO INDO-EUROPEAN DIALECTS USING COMPUTATIONAL COMPARATIVE LINGUISTIC MODELING AND DEEP NEURAL ARCHIVING

By
Mastura Tadjieva Orcid logo ,
Mastura Tadjieva
Contact Mastura Tadjieva

Termez State University , Termez , Uzbekistan

Zaynab Matniyazova Orcid logo ,
Zaynab Matniyazova

Bukhara State Medical Institute named after Abu Ali ibn Sino , Bukhara , Uzbekistan

Zarina Djumayeva Orcid logo ,
Zarina Djumayeva

Department of Uzbek Language and Literature with Russian Language, Samarkand State Medical University , Samarkand , Uzbekistan

Khilola Umarkhujaeva Orcid logo ,
Khilola Umarkhujaeva

Senior Teacher, Tashkent University of Information Technologies Named After Muhammad Al- Khwarizmi , Tashkent , Uzbekistan

Mokhirukh Khoshimkhujaeva Orcid logo ,
Mokhirukh Khoshimkhujaeva

Department of English Language and Literature, Termez University of Economics and Service , Termez , Uzbekistan

Mohira Ankabayeva Orcid logo ,
Mohira Ankabayeva

Lecturer, Jizzakh State Pedagogical University , Jizzakh , Uzbekistan

Otabek Yusupov Orcid logo
Otabek Yusupov

Associate Professor, Vice-Rector for Science and Innovations, Uzbekistan State University of World Languages , Tashkent , Uzbekistan

Abstract

Manual comparative methods have long been the main source for reconstructing Proto-Indo-European (PIE) dialects, with their weaknesses including fragmentary corpora, interpretive bias, and a lack of direct textual evidence. This paper introduces a probabilistic semantic reconstruction model that combines computational comparative linguistics and deep neural archiving to learn and reconstruct the dialectal variations that have been lost in PIE. A multilingual dataset of 12 Indo-European language branches and 18,742 cognate sets, with phonological, morphological, and semantic feature embeddings, was compiled and entered. An inverse phylogenetic inference model based on Bayesian inference and a transformer-based deep neural network trained on 4.6 million aligned lexical tokens was used to predict proto-forms and semantic shifts. When tested against known scholarly reconstructions, the proposed model achieved 86.3% accuracy in phonological reconstruction and 0.81 semantic consistency (cosine similarity metric). Cross-validation indicated a 14.7% decrease in reconstruction variance compared to traditional rule-based methods. Probabilistic confidence intervals (95% CI) also showed consistent predictions for high-frequency lexical roots, with posterior probabilities greater than 0.90 for the reconstructed forms (63%). Moreover, statistically significant divergence patterns (p < 0.01) were observed in the dialectal clustering analysis and were consistent with established Indo-European subgroup stratifications. The results show that probabilistic modelling with deep neural semantic archiving can significantly improve the reliability and interpretability of reconstruction. This framework offers a computational approach to historical linguistics that can be scaled and replicated. Also, it provides a new quantitative understanding of the evolution of proto-languages and dialect differentiation within the      Indo-European family.

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

Issue image
Issue 35, 2026
See full issue

Citations

Crossref Logo

0

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.