Abstract
The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ∼7.5× sequence coverage, producing a high-quality draft assembly 2.6 gigabases in size, of which 90% of these bases are in 479 scaffolds. For the mouse genome, which is a large and repeat-rich genome, the input dataset was designed to include a high proportion of paired end sequences of various size selected inserts, from 2–200 kbp lengths, into various host vector templates. Phusion uses sequence data, called reads, and information about reads that share common templates, called read pairs, to drive the assembly of this large genome to highly accurate results. The preassembly stage, which clusters the reads into sensible groups, is a key element of the entire assembler, because it permits a simple approach to parallelization of the assembly stage, as each cluster can be treated independent of the others. In addition to the application of Phusion to the mouse genome, we will also present results from the WGS assembly of Caenorhabditis briggsae sequenced to about 11× coverage. The C. briggsae assembly was accessioned through EMBL, http://www.ebi.ac.uk/services/index.html , using the series CAAC01000001–CAAC01000578, however, the Phusion mouse assembly described here was not accessioned. The mouse data was generated by the Mouse Genome Sequencing Consortium. The C. briggsae sequence was generated at The Wellcome Trust Sanger Institute and the Genome Sequencing Center, Washington University School of Medicine.
Keywords
Affiliated Institutions
Related Publications
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitation...
The Atlas Genome Assembly System
Atlas is a suite of programs developed for assembly of genomes by a “combined approach” that uses DNA sequence reads from both BACs and whole-genome shotgun (WGS) libraries. The...
The Sequence of the Human Genome
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp ...
PCAP: A Whole-Genome Assembly Program
We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues ...
GAGE: A critical evaluation of genome assemblies and assembly algorithms
New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previousl...
Publication Info
- Year
- 2002
- Type
- article
- Volume
- 13
- Issue
- 1
- Pages
- 81-90
- Citations
- 220
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1101/gr.731003