Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges

Abstract

Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.

Keywords

Computer sciencePreprocessorDNA sequencingSequence assemblySoftwareGraphProcess (computing)Theoretical computer scienceArtificial intelligenceBiologyProgramming languageDNAGene

Affiliated Institutions

Related Publications

IDBA-UD: a <i>de novo</i> assembler for single-cell and metagenomic sequencing data with highly uneven depth

Yu Peng , Henry C. M. Leung , Siu‐Ming Yiu +1 more

Abstract Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. How...

2012 Bioinformatics 3099 citations

GAGE: A critical evaluation of genome assemblies and assembly algorithms

Steven L. Salzberg , Adam M. Phillippy , Aleksey V. Zimin +10 more

New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previousl...

2011 Genome Research 733 citations

Error filtering, pair assembly and error correction for next-generation sequencing reads

R. C. Edgar , Henrik Flyvbjerg

Abstract Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low. ...

2015 Bioinformatics 1267 citations

Emerging technologies in DNA sequencing

Michael L. Metzker

Demand for DNA sequence information has never been greater, yet current Sanger technology is too costly, time consuming, and labor intensive to meet this ongoing demand. Applica...

2005 Genome Research 477 citations

Assembly of long, error-prone reads using repeat graphs

Mikhail Kolmogorov , Jeffrey Yuan , Yu Lin +1 more

2019 Nature Biotechnology 5451 citations

Publication Info

Year: 2013
Type: review
Volume: 9
Issue: 12
Pages: e1003345-e1003345
Citations: 144
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

144

OpenAlex

Cite This

APA Style

                            
                                
                                    Sara El‐Metwally, 
                                
                                    Taher Hamza, 
                                
                                    Magdi Zakaria
                                
                                et al.
                            
                            (2013). 
                            Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges. 
                            PLoS Computational Biology
                            , 9
                            (12)
                            , e1003345-e1003345.
                            https://doi.org/10.1371/journal.pcbi.1003345
                        

Identifiers

DOI: 10.1371/journal.pcbi.1003345