KIT | KIT-Bibliothek | Impressum | Datenschutz

A LASSO-based approach to sample sites for phylogenetic tree search

Ecker, Noa; Azouri, Dana; Bettisworth, Ben 1; Stamatakis, Alexandros 1; Mansour, Yishay; Mayrose, Itay; Pupko, Tal
1 Institut für Theoretische Informatik (ITI), Karlsruher Institut für Technologie (KIT)

Abstract:

Motivation
In recent years, full-genome sequences have become increasingly available and as a result many modern phylogenetic analyses are based on very long sequences, often with over 100 000 sites. Phylogenetic reconstructions of large-scale alignments are challenging for likelihood-based phylogenetic inference programs and usually require using a powerful computer cluster. Current tools for alignment trimming prior to phylogenetic analysis do not promise a significant reduction in the alignment size and are claimed to have a negative effect on the accuracy of the obtained tree.

Results
Here, we propose an artificial-intelligence-based approach, which provides means to select the optimal subset of sites and a formula by which one can compute the log-likelihood of the entire data based on this subset. Our approach is based on training a regularized Lasso-regression model that optimizes the log-likelihood prediction accuracy while putting a constraint on the number of sites used for the approximation. We show that computing the likelihood based on 5% of the sites already provides accurate approximation of the tree likelihood based on the entire data. ... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000148294
Veröffentlicht am 02.08.2022
Originalveröffentlichung
DOI: 10.1093/bioinformatics/btac252
Scopus
Zitationen: 1
Dimensions
Zitationen: 2
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Zeitschriftenaufsatz
Publikationsdatum 24.06.2022
Sprache Englisch
Identifikator ISSN: 1367-4811, 0266-7061, 1367-4803, 1460-2059
KITopen-ID: 1000148294
Erschienen in Bioinformatics (Oxford, England)
Verlag Oxford University Press (OUP)
Band 38
Heft Suppl. 1
Seiten i118–i124
Nachgewiesen in Scopus
Web of Science
Dimensions
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page