Abstract

Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.

Keywords

PhylogenomicsBiologyGenomicsGenomeMetagenomicsCode refactoringData qualityComputational biologySoftwareGeneComputer sciencePhylogeneticsGeneticsMetric (unit)Clade

Affiliated Institutions

Related Publications

Publication Info

Year
2017
Type
article
Volume
35
Issue
3
Pages
543-548
Citations
2288
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2288
OpenAlex

Cite This

Robert M. Waterhouse, Mathieu Seppey, Felipe A. Simão et al. (2017). BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Molecular Biology and Evolution , 35 (3) , 543-548. https://doi.org/10.1093/molbev/msx319

Identifiers

DOI
10.1093/molbev/msx319