Abstract

Abstract Motivation: Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources. Results: We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly. Availability: Bambus 2 is open source and available from http://amos.sf.net. Contact: mpop@umiacs.umd.edu Supplementary Information: Supplementary data are available at Bioinformatics online.

Keywords

MetagenomicsSoftwareJoinsComputer scienceComputational biologyData miningBiologyGeneticsGeneProgramming language

MeSH Terms

AlgorithmsMetagenomeMetagenomicsSequence AnalysisDNASoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2011
Type
article
Volume
27
Issue
21
Pages
2964-2971
Citations
133
Access
Closed

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

133
OpenAlex
6
Influential
115
CrossRef

Cite This

Sergey Koren, Todd J. Treangen, Mihai Pop (2011). Bambus 2: scaffolding metagenomes. Bioinformatics , 27 (21) , 2964-2971. https://doi.org/10.1093/bioinformatics/btr520

Identifiers

DOI
10.1093/bioinformatics/btr520
PMID
21926123
PMCID
PMC3198580

Data Quality

Data completeness: 86%