Whole-genome shotgun assembly and comparison of human genome assemblies

2004 Proceedings of the National Academy of Sciences 184 citations

Abstract

We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304–1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860–921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.

Keywords

Shotgun sequencingGenomeSequence assemblyReference genomeHybrid genome assemblyHuman genomeComputational biologyBiologyWhole genome sequencingShotgunGeneticsGenome projectGene

Affiliated Institutions

Related Publications

The Phusion Assembler

The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ∼7.5× sequence coverage,...

2002 Genome Research 220 citations

The fragment assembly string graph

Abstract We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collecte...

2005 Bioinformatics 431 citations

Publication Info

Year
2004
Type
article
Volume
101
Issue
7
Pages
1916-1921
Citations
184
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

184
OpenAlex

Cite This

Sorin Istrail, Granger G. Sutton, Liliana Florea et al. (2004). Whole-genome shotgun assembly and comparison of human genome assemblies. Proceedings of the National Academy of Sciences , 101 (7) , 1916-1921. https://doi.org/10.1073/pnas.0307971100

Identifiers

DOI
10.1073/pnas.0307971100