KIT | KIT-Bibliothek | Impressum | Datenschutz

Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult

Morel, Benoit; Barbera, Pierre; Czech, Lucas; Bettisworth, Ben; Hübner, Lukas ORCID iD icon 1; Lutteropp, Sarah; Serdari, Dora; Kostaki, Evangelia-Georgia; Mamais, Ioannis; Kozlov, Alexey M.; Pavlidis, Pavlos; Paraskevis, Dimitrios; Stamatakis, Alexandros ORCID iD icon 1
1 Institut für Theoretische Informatik (ITI), Karlsruher Institut für Technologie (KIT)

Abstract:

Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.


Verlagsausgabe §
DOI: 10.5445/IR/1000133650
Veröffentlicht am 08.06.2021
Originalveröffentlichung
DOI: 10.1093/molbev/msaa314
Scopus
Zitationen: 84
Web of Science
Zitationen: 81
Dimensions
Zitationen: 150
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2021
Sprache Englisch
Identifikator ISSN: 1537-1719, 0737-4038
KITopen-ID: 1000133650
Erschienen in Molecular biology and evolution
Verlag Oxford University Press (OUP)
Band 38
Heft 5
Seiten 1777-1791
Vorab online veröffentlicht am 15.12.2020
Schlagwörter SARS-CoV-2, phylogenetic inference, phylogeny rooting, outgroups, strain classification
Nachgewiesen in Web of Science
Dimensions
Scopus
Globale Ziele für nachhaltige Entwicklung Ziel 3 – Gesundheit und Wohlergehen
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page