KIT | KIT-Bibliothek | Impressum | Datenschutz

A systematic exploration of current limitations of cognate-based phylogenetic inference

Häuser, Luise 1; Jäger, Gerhard; Stamatakis, Alexandros P. ORCID iD icon 1
1 Institut für Theoretische Informatik (ITI), Karlsruher Institut für Technologie (KIT)

Abstract:

Background
Computational tools for phylogenetic inference are now routinely applied to data from historical linguistics, especially cognate data.

Methods
We initially provide an overview of the cognate datasets that are publicly available at present and compare the amount of cognate data with the available masses of molecular data. Then, we outline the drawbacks of the standard binary cognate data representation and introduce an alternative representation that alleviates some of these disadvantages. We also introduce dedicated, parameter-rich evolutionary models for this novel representation. We implement the model and investigate its behavior. In addition, we conduct an orthogonal experiment to investigate whether machine learning-based approaches can be used for cognate data.

Results
Our experiments show that our newly introduced models can currently not be applied, as they exhibit clear indications for overparameterization due to the small size of the available cognate datasets. We demonstrate that, for the same reason, the applicability of emerging machine learning-based approaches to cognate data is highly limited.

Conclusion
... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000190716
Veröffentlicht am 17.02.2026
Originalveröffentlichung
DOI: 10.12688/openreseurope.20351.3
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2026
Sprache Englisch
Identifikator ISSN: 2732-5121
KITopen-ID: 1000190716
Erschienen in Open Research Europe
Verlag European Commission (EU)
Band 5
Seiten 258
Vorab online veröffentlicht am 28.01.2026
Schlagwörter Phylogenetic Inference, Historical Linguistics, Maximum Likelihood,, Evolutionary Model, Cognate Data, Machine Learning, Phylogenetic Difficulty
Nachgewiesen in Scopus
OpenAlex
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page