Abstract

We have developed a new set of algorithms, collectively called “Velvet,” to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words ( k -mers) that is ideal for high coverage, very short read (25–50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of ∼8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

Keywords

ContigDe Bruijn sequenceVelvetDe Bruijn graphBiologyk-merSequence assemblyAlgorithmComputer scienceComputational biologyGeneticsCombinatoricsDNA sequencingGenomeMathematicsGene

Affiliated Institutions

Related Publications

Publication Info

Year
2008
Type
article
Volume
18
Issue
5
Pages
821-829
Citations
9539
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

9539
OpenAlex

Cite This

Daniel R. Zerbino, Ewan Birney (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research , 18 (5) , 821-829. https://doi.org/10.1101/gr.074492.107

Identifiers

DOI
10.1101/gr.074492.107