Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult

Morel, Benoit; Barbera, Pierre; Czech, Lucas; Bettisworth, Ben; Hübner, Lukas; Lutteropp, Sarah; Serdari, Dora; Kostaki, Evangelia-Georgia; Mamais, Ioannis; Kozlov, Alexey M.; Pavlidis, Pavlos; Paraskevis, Dimitrios; Stamatakis, Alexandros

doi:10.1093/molbev/msaa314

Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult

Morel, Benoit; Barbera, Pierre; Czech, Lucas; Bettisworth, Ben; Hübner, Lukas

¹; Lutteropp, Sarah; Serdari, Dora; Kostaki, Evangelia-Georgia; Mamais, Ioannis; Kozlov, Alexey M.; Pavlidis, Pavlos; Paraskevis, Dimitrios; Stamatakis, Alexandros

¹
¹ Institut für Theoretische Informatik (ITI), Karlsruher Institut für Technologie (KIT)

Abstract:

Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000133650

Veröffentlicht am 08.06.2021

Externe Links

Originalveröffentlichung
DOI: 10.1093/molbev/msaa314

Scopus
Zitationen: 108

Web of Science
Zitationen: 125

Dimensions
Zitationen: 183

Export

Statistiken

Seitenaufrufe: 226
seit 09.06.2021

Downloads: 161
seit 09.06.2021

Zugehörige Institution(en) am KIT	Institut für Theoretische Informatik (ITI)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2021
Sprache	Englisch
Identifikator	ISSN: 1537-1719, 0737-4038 KITopen-ID: 1000133650
Erschienen in	Molecular biology and evolution
Verlag	Oxford University Press (OUP)
Band	38
Heft	5
Seiten	1777-1791
Vorab online veröffentlicht am	15.12.2020
Schlagwörter	SARS-CoV-2, phylogenetic inference, phylogeny rooting, outgroups, strain classification
Nachgewiesen in	Web of Science OpenAlex Scopus Dimensions
Globale Ziele für nachhaltige Entwicklung

Repository KITopen

Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult

Abstract: