Abstract

Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS ( A ssembly By S hort S equences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs ≥100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes.

Keywords

ContigSequence assemblyBiologyHybrid genome assemblyMassive parallel sequencingGenomeReference genomeHuman genomeComputational biologyDNA sequencingSequence (biology)GeneticsSoftwareMassively parallelComputer scienceDNAParallel computingGeneProgramming language

Affiliated Institutions

Related Publications

Publication Info

Year
2009
Type
article
Volume
19
Issue
6
Pages
1117-1123
Citations
3668
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

3668
OpenAlex

Cite This

Jared T. Simpson, Kim Wong, Shaun D. Jackman et al. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research , 19 (6) , 1117-1123. https://doi.org/10.1101/gr.089532.108

Identifiers

DOI
10.1101/gr.089532.108