Abstract

Abstract Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low. Results: We demonstrate large reductions in error frequencies, especially for high-error-rate reads, by three independent means: (i) filtering reads according to their expected number of errors, (ii) assembling overlapping read pairs and (iii) for amplicon reads, by exploiting unique sequence abundances to perform error correction. We also show that most published paired read assemblers calculate incorrect posterior quality scores. Availability and implementation: These methods are implemented in the USEARCH package. Binaries are freely available at http://drive5.com/usearch. Contact: robert@drive5.com Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

Computer scienceWord error rateError detection and correctionSequence (biology)Amplicon sequencingAmpliconSoftwareAlgorithmData miningArtificial intelligenceBiologyGeneticsProgramming language

Affiliated Institutions

Related Publications

Publication Info

Year
2015
Type
article
Volume
31
Issue
21
Pages
3476-3482
Citations
1267
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1267
OpenAlex

Cite This

R. C. Edgar, Henrik Flyvbjerg (2015). Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics , 31 (21) , 3476-3482. https://doi.org/10.1093/bioinformatics/btv401

Identifiers

DOI
10.1093/bioinformatics/btv401