Abstract
Abstract Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low. Results: We demonstrate large reductions in error frequencies, especially for high-error-rate reads, by three independent means: (i) filtering reads according to their expected number of errors, (ii) assembling overlapping read pairs and (iii) for amplicon reads, by exploiting unique sequence abundances to perform error correction. We also show that most published paired read assemblers calculate incorrect posterior quality scores. Availability and implementation: These methods are implemented in the USEARCH package. Binaries are freely available at http://drive5.com/usearch. Contact: robert@drive5.com Supplementary information: Supplementary data are available at Bioinformatics online.
Keywords
Affiliated Institutions
Related Publications
Search and clustering orders of magnitude faster than BLAST
Abstract Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAS...
IDBA-UD: a <i>de novo</i> assembler for single-cell and metagenomic sequencing data with highly uneven depth
Abstract Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. How...
UCHIME improves sensitivity and speed of chimera detection
Abstract Motivation: Chimeric DNA sequences often form during polymerase chain reaction amplification, especially when sequencing single regions (e.g. 16S rRNA or fungal Interna...
Fragment assembly with short reads
Abstract Motivation: Current DNA sequencing technology produces reads of about 500–750 bp, with typical coverage under 10×. New sequencing technologies are emerging that produce...
Minimap2: pairwise alignment for nucleotide sequences
Abstract Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic cont...
Publication Info
- Year
- 2015
- Type
- article
- Volume
- 31
- Issue
- 21
- Pages
- 3476-3482
- Citations
- 1267
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btv401