Abstract

Abstract Centrifuge is a novel microbial classification engine that enables rapid, accurate and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4,078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI non-redundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer based indexing schemes, which require far more extensive space. Centrifuge is available as free, open-source software from www.ccb.jhu.edu/software/centrifuge

Keywords

CentrifugeMetagenomicsSearch engine indexingComputer scienceIndex (typography)SoftwareData miningProcess (computing)Information retrievalBiologyOperating systemWorld Wide Web

Affiliated Institutions

Related Publications

Publication Info

Year
2016
Type
preprint
Citations
109
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

109
OpenAlex

Cite This

Daehwan Kim, Li Song, Florian P. Breitwieser et al. (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. bioRxiv (Cold Spring Harbor Laboratory) . https://doi.org/10.1101/054965

Identifiers

DOI
10.1101/054965