Abstract

Abstract Motivation: Current DNA sequencing technology produces reads of about 500–750 bp, with typical coverage under 10×. New sequencing technologies are emerging that produce shorter reads (length 80–200 bp) but allow one to generate significantly higher coverage (30× and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with current read technology but were not designed for assembly of short reads. Results: We analyze the limitations of assembling reads generated by these new technologies and present a routine for base-calling in reads prior to their assembly. We demonstrate that while it is feasible to assemble such short reads, the resulting contigs will require significant (if not prohibitive) finishing efforts. Availability: Available from the web at http://www.cse.ucsd.edu/groups/bioinformatics/software.html

Keywords

ContigSequence assemblyComputer scienceSoftwareBase (topology)Hybrid genome assemblyDNA sequencingNanopore sequencingData miningOperating systemDNABiologyGenomeGeneticsMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2004
Type
article
Volume
20
Issue
13
Pages
2067-2074
Citations
213
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

213
OpenAlex

Cite This

Mark Chaisson, Pavel A. Pevzner, Haixu Tang (2004). Fragment assembly with short reads. Bioinformatics , 20 (13) , 2067-2074. https://doi.org/10.1093/bioinformatics/bth205

Identifiers

DOI
10.1093/bioinformatics/bth205