PROBABILISTIC SEMANTIC RECONSTRUCTION OF LOST PROTO INDO-EUROPEAN DIALECTS USING COMPUTATIONAL COMPARATIVE LINGUISTIC MODELING AND DEEP NEURAL ARCHIVING

Mastura Tadjieva; Zaynab Matniyazova; Zarina Djumayeva; Khilola Umarkhujaeva; Mokhirukh Khoshimkhujaeva; Mohira Ankabayeva; Otabek Yusupov

doi:10.70102/afts.2026.1835.049

Original scientific article

Published: May 2026

<< Prev | Next >>

PDF

https://doi.org/10.70102/afts.2026.1835.049

PROBABILISTIC SEMANTIC RECONSTRUCTION OF LOST PROTO INDO-EUROPEAN DIALECTS USING COMPUTATIONAL COMPARATIVE LINGUISTIC MODELING AND DEEP NEURAL ARCHIVING

Abstract

Manual comparative methods have long been the main source for reconstructing Proto-Indo-European (PIE) dialects, with their weaknesses including fragmentary corpora, interpretive bias, and a lack of direct textual evidence. This paper introduces a probabilistic semantic reconstruction model that combines computational comparative linguistics and deep neural archiving to learn and reconstruct the dialectal variations that have been lost in PIE. A multilingual dataset of 12 Indo-European language branches and 18,742 cognate sets, with phonological, morphological, and semantic feature embeddings, was compiled and entered. An inverse phylogenetic inference model based on Bayesian inference and a transformer-based deep neural network trained on 4.6 million aligned lexical tokens was used to predict proto-forms and semantic shifts. When tested against known scholarly reconstructions, the proposed model achieved 86.3% accuracy in phonological reconstruction and 0.81 semantic consistency (cosine similarity metric). Cross-validation indicated a 14.7% decrease in reconstruction variance compared to traditional rule-based methods. Probabilistic confidence intervals (95% CI) also showed consistent predictions for high-frequency lexical roots, with posterior probabilities greater than 0.90 for the reconstructed forms (63%). Moreover, statistically significant divergence patterns (p < 0.01) were observed in the dialectal clustering analysis and were consistent with established Indo-European subgroup stratifications. The results show that probabilistic modelling with deep neural semantic archiving can significantly improve the reliability and interpretability of reconstruction. This framework offers a computational approach to historical linguistics that can be scaled and replicated. Also, it provides a new quantitative understanding of the evolution of proto-languages and dialect differentiation within the Indo-European family.

Keywords:

proto-indo-european reconstruction,

computational comparative linguistics,

probabilistic phylogenetic modelling,

deep neural language models,

semantic embedding analysis,

historical linguistics digitisation,

dialectal evolution modelling

References

Hartmann F. The phonetic value of the Proto-Indo-European laryngeals: A computational study using deep neural networks. Indo-European Linguistics. 2021 Mar 24;9(1):26-84.

Sommerschield T, Assael Y, Pavlopoulos J, Stefanak V, Senior A, Dyer C, Bodel J, Prag J, Androutsopoulos I, De Freitas N. Machine learning for ancient languages: A survey. Computational Linguistics. 2023 Sep;49(3):703-47.

Frisco A, Wilkinson C. Reconstructing the Unwritten: Methodological Advances and Challenges in the Study of Ancient Languages. Language Perspectives. 2025 Nov 7;1(1):1-7.

Arora A, Farris A, Basu S, Kolichala S. Computational historical linguistics and language diversity in South Asia. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022 May (pp. 1396-1409).

Skirgård H. Disentangling ancestral state reconstruction in historical linguistics: Comparing classic approaches and new methods using Oceanic grammar. Diachronica. 2024 Jun 28;41(1):46-98.

Citation

Copyright

This is an open access article distributed under the Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics

Google scholar: See link

Issue 35, 2026

PEDAGOGICAL IMPACT OF MULTI SENSORY VIRTUAL REALITY SIMULATIONS ON HISTORICAL PERSPECTIVE TAKING AND SPATIAL UNDERSTANDING IN SECONDARY EDUCATION SYMPHONY OF THE SEAS: UNVEILING METAGENOMIC BIODIVERSITY THROUGH 16S rRNA IN THE RED SEA RICE LEAF DISEASE DIAGNOSIS THROUGH DEEP LEARNING: AN INCEPTIONV3 APPROACH WITH SPATIAL ATTENTION FOR SUSTAINABLE AGRICULTURE AND FOOD SECURITY PROBABILISTIC SEMANTIC RECONSTRUCTION OF LOST PROTO INDO-EUROPEAN DIALECTS USING COMPUTATIONAL COMPARATIVE LINGUISTIC MODELING AND DEEP NEURAL ARCHIVING CONSCIOUS CONSUMERISM CAUSE RELATED MARKETING IN GEN Z AWARENESS AND ACTION See full issue

About us

Editorial policy

PROBABILISTIC SEMANTIC RECONSTRUCTION OF LOST PROTO INDO-EUROPEAN DIALECTS USING COMPUTATIONAL COMPARATIVE LINGUISTIC MODELING AND DEEP NEURAL ARCHIVING

Abstract

Keywords:

References

Citation

Copyright

Article metrics

Issue 35, 2026

Citations

Disclaimer