Abstract
The recent introduction of massively parallel pyrosequencers allows rapid, inexpensive analysis of microbial community composition using 16S ribosomal RNA (rRNA) sequences. However, a major challenge is to design a workflow so that taxonomic information can be accurately and rapidly assigned to each read, so that the composition of each community can be linked back to likely ecological roles played by members of each species, genus, family or phylum. Here, we use three large 16S rRNA datasets to test whether taxonomic information based on the full-length sequences can be recaptured by short reads that simulate the pyrosequencer outputs. We find that different taxonomic assignment methods vary radically in their ability to recapture the taxonomic information in full-length 16S rRNA sequences: most methods are sensitive to the region of the 16S rRNA gene that is targeted for sequencing, but many combinations of methods and rRNA regions produce consistent and accurate results. To process large datasets of partial 16S rRNA sequences obtained from surveys of various microbial communities, including those from human body habitats, we recommend the use of Greengenes or RDP classifier with fragments of at least 250 bases, starting from one of the primers R357, R534, R798, F343 or F517.
Keywords
Affiliated Institutions
Related Publications
Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy
ABSTRACT The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-orde...
Phylogenetic stratigraphy in the Guerrero Negro hypersaline microbial mat
Abstract The microbial mats of Guerrero Negro (GN), Baja California Sur, Mexico historically were considered a simple environment, dominated by cyanobacteria and sulfate-reducin...
Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes
Among available genome relatedness indices, average nucleotide identity (ANI) is one of the most robust measurements of genomic relatedness between strains, and has great potent...
Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB
ABSTRACT A 16S rRNA gene database ( http://greengenes.lbl.gov ) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic cl...
Diversity and Structure of Bacterial Communitiesin Arctic versus Antarctic PackIce
ABSTRACT A comprehensive assessment of bacterial diversity and community composition in arctic and antarctic pack ice was conducted through cultivation and cultivation-independe...
Publication Info
- Year
- 2008
- Type
- article
- Volume
- 36
- Issue
- 18
- Pages
- e120-e120
- Citations
- 587
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/nar/gkn491