The fragment assembly string graph | RDL Research Database

Abstract

Abstract We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes. Contact: gene@eecs.berkeley.edu

Keywords

Fragment (logic)Computer scienceGraphString (physics)Theoretical computer scienceComputational biologyProgramming languageBiologyPhysicsTheoretical physics

Affiliated Institutions

University of California, Berkeley US

Related Publications

An Eulerian path approach to DNA fragment assembly

Pavel A. Pevzner , Haixu Tang , Michael S. Waterman

For the last 20 years, fragment assembly in DNA sequencing followed the “overlap–layout–consensus” paradigm that is used in all currently available assembly tools. Although this...

2001 Proceedings of the National Academy o... 1358 citations

A parallel graph decomposition algorithm for DNA sequencing with nanopores

Shahid H. Bokhari , J.R. Sauer

Abstract Motivation: With the potential availability of nanopore devices that can sense the bases of translocating single-stranded DNA (ssDNA), it is likely that ‘reads’ of leng...

2004 Bioinformatics 18 citations

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

Daniel R. Zerbino , Ewan Birney

We have developed a new set of algorithms, collectively called “Velvet,” to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representat...

2008 Genome Research 9539 citations

IDBA-UD: a <i>de novo</i> assembler for single-cell and metagenomic sequencing data with highly uneven depth

Yu Peng , Henry C. M. Leung , Siu‐Ming Yiu +1 more

Abstract Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. How...

2012 Bioinformatics 3099 citations

Colored de Bruijn Graphs and the Genome Halving Problem

Max A. Alekseyev , Pavel A. Pevzner

Breakpoint graph analysis is a key algorithmic technique in studies of genome rearrangements. However, breakpoint graphs are defined only for genomes without duplicated genes, t...

2007 IEEE/ACM Transactions on Computationa... 45 citations

Publication Info

Year: 2005
Type: article
Volume: 21
Issue: suppl_2
Pages: ii79-ii85
Citations: 431
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

The fragment assembly string graph

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

431

OpenAlex

Cite This

APA Style

                            
                                    Eugene W. Myers
                                
                            (2005). 
                            The fragment assembly string graph. 
                            Bioinformatics
                            , 21
                            (suppl_2)
                            , ii79-ii85.
                            https://doi.org/10.1093/bioinformatics/bti1114

Identifiers

DOI: 10.1093/bioinformatics/bti1114