Abstract

Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.

Keywords

Computer sciencePreprocessorDNA sequencingSequence assemblySoftwareGraphProcess (computing)Theoretical computer scienceArtificial intelligenceBiologyProgramming languageDNAGene

Affiliated Institutions

Related Publications

Publication Info

Year
2013
Type
review
Volume
9
Issue
12
Pages
e1003345-e1003345
Citations
144
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

144
OpenAlex

Cite This

Sara El‐Metwally, Taher Hamza, Magdi Zakaria et al. (2013). Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges. PLoS Computational Biology , 9 (12) , e1003345-e1003345. https://doi.org/10.1371/journal.pcbi.1003345

Identifiers

DOI
10.1371/journal.pcbi.1003345