Abstract

We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.

Keywords

Shotgun sequencingk-merSequence assemblyHybrid genome assemblyShotgunComputer scienceGenomeComputational biologyDNA sequencingSet (abstract data type)BiologyAlgorithmGeneticsDNAGene

MeSH Terms

AlgorithmsContig MappingHigh-Throughput Nucleotide SequencingHumansSequence AnalysisDNA

Affiliated Institutions

Related Publications

The Phusion Assembler

The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ∼7.5× sequence coverage,...

2002 Genome Research 220 citations

Publication Info

Year
2013
Type
article
Volume
14
Issue
S5
Pages
S18-S18
Citations
90
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

90
OpenAlex
7
Influential
68
CrossRef

Cite This

Guy Bresler, Ma’ayan Bresler, David Tse (2013). Optimal assembly for high throughput shotgun sequencing. BMC Bioinformatics , 14 (S5) , S18-S18. https://doi.org/10.1186/1471-2105-14-s5-s18

Identifiers

DOI
10.1186/1471-2105-14-s5-s18
PMID
23902516
PMCID
PMC3706340
arXiv
1301.0068

Data Quality

Data completeness: 88%