Abstract
Abstract Motivation: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Results: Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. Availability: The software is available at http://icorn.sourceforge.net Contact: tdo@sanger.ac.uk; cnewbold@hammer.imm.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Keywords
Affiliated Institutions
Related Publications
Assembling millions of short DNA sequences using SSAKE
Abstract Summary: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging...
Error filtering, pair assembly and error correction for next-generation sequencing reads
Abstract Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low. ...
Fast and accurate short read alignment with Burrows–Wheeler transform
Abstract Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A...
QUAST: quality assessment tool for genome assemblies
Abstract Summary: Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers h...
SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing
The latest revolution in the DNA sequencing field has been brought about by the development of automated sequencers that are capable of generating giga base pair data sets quick...
Publication Info
- Year
- 2010
- Type
- article
- Volume
- 26
- Issue
- 14
- Pages
- 1704-1707
- Citations
- 222
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btq269