Abstract

The Genome Taxonomy Database is a phylogenetically consistent, genome-based taxonomy that provides rank-normalized classifications for ~150,000 bacterial and archaeal genomes from domain to genus. However, almost 40% of the genomes in the Genome Taxonomy Database lack a species name. We address this limitation by using commonly accepted average nucleotide identity criteria to set bounds on species and propose species clusters that encompass all publicly available bacterial and archaeal genomes. Unlike previous average nucleotide identity studies, we chose a single representative genome to serve as the effective nomenclatural 'type' defining each species. Of the 24,706 proposed species clusters, 8,792 are based on published names. We assigned placeholder names to the remaining 15,914 species clusters to provide names to the growing number of genomes from uncultivated species. This resource provides a complete domain-to-species taxonomic framework for bacterial and archaeal genomes, which will facilitate research on uncultivated species and improve communication of scientific results.

Keywords

GenomeBiologyTaxonomy (biology)Species nameArchaeaBacterial taxonomyBacterial genome sizeEvolutionary biologyTaxonomic rankPhylogeneticsComputational biologyGeneticsZoologyEcologyBacteriaTaxonGene16S ribosomal RNA

MeSH Terms

ArchaeaBacteriaDatabasesGeneticGenomeArchaealGenomeBacterialNucleic Acid HybridizationPhylogenyReproducibility of Results

Affiliated Institutions

Related Publications

Publication Info

Year
2020
Type
article
Volume
38
Issue
9
Pages
1079-1086
Citations
1458
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1458
OpenAlex
134
Influential

Cite This

Donovan H. Parks, Maria Chuvochina, Pierre-Alain Chaumeil et al. (2020). A complete domain-to-species taxonomy for Bacteria and Archaea. Nature Biotechnology , 38 (9) , 1079-1086. https://doi.org/10.1038/s41587-020-0501-8

Identifiers

DOI
10.1038/s41587-020-0501-8
PMID
32341564

Data Quality

Data completeness: 81%