Abstract
Abstract Next-generation sequencing techniques, and PhyloChip, have made simultaneous phylogenetic analyses of hundreds of microbial communities possible. Insight into community structure has been limited by the inability to integrate and visualize such vast datasets. Fast UniFrac overcomes these issues, allowing integration of larger numbers of sequences and samples into a single analysis. Its new array-based implementation offers orders of magnitude improvements over the original version. New 3D visualization of principal coordinates analysis results, with the option to view multiple coordinate axes simultaneously, provides a powerful way to quickly identify patterns that relate vast numbers of microbial communities. We show the potential of Fast UniFrac using examples from three data types: Sanger-sequencing studies of diverse free-living and animal-associated bacterial assemblages and from the gut of obese humans as they diet, pyrosequencing data integrated from studies of the human hand and gut, and PhyloChip data from a study of citrus pathogens. We show that a Fast UniFrac analysis using a reference tree recaptures patterns that could not be detected without considering phylogenetic relationships and that Fast UniFrac, coupled with BLAST-based sequence assignment, can be used to quickly analyze pyrosequencing runs containing hundreds of thousands of sequences, showing patterns relating human and gut samples. Finally, we show that the application of Fast UniFrac to PhyloChip data could identify well-defined subcategories associated with infection. Together, these case studies point the way toward a broad range of applications and show some of the new features of Fast UniFrac.
Keywords
Affiliated Institutions
Related Publications
Genome sequencing in microfabricated high-density picolitre reactors
The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly...
Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample
The ongoing revolution in high-throughput sequencing continues to democratize the ability of small groups of investigators to map the microbial component of the biosphere. In pa...
Emerging technologies in DNA sequencing
Demand for DNA sequence information has never been greater, yet current Sanger technology is too costly, time consuming, and labor intensive to meet this ongoing demand. Applica...
NCBI GEO: archive for functional genomics data sets--10 years on
A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve ...
Error filtering, pair assembly and error correction for next-generation sequencing reads
Abstract Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low. ...
Publication Info
- Year
- 2009
- Type
- article
- Volume
- 4
- Issue
- 1
- Pages
- 17-27
- Citations
- 1101
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1038/ismej.2009.97