Abstract

Abstract Motivation: Current genomic sequence assemblers assume that the input data is derived from a single, homogeneous source. However, recent whole-genome shotgun sequencing projects have violated this assumption, resulting in input fragments covering the same region of the genome whose sequences differ due to polymorphic variation in the population. While single-nucleotide polymorphisms (SNPs) do not pose a significant problem to state-of-the-art assembly methods, these methods do not handle insertion/deletion (indel) polymorphisms of more than a few bases. Results: This paper describes an efficient method for detecting sequence discrepencies due to polymorphism that avoids resorting to global use of more costly, less stringent affine sequence alignments. Instead, the algorithm uses graph-based methods to determine the small set of fragments involved in each polymorphism and performs more sophisticated alignments only among fragments in that set. Results from the incorporation of this method into the Celera Assembler are reported for the D. melanogaster, H. sapiens, and M. musculus genomes. Availability: The method described herein does not constitute a stand-alone software application, but is laid out in sufficient detail to be implemented as a component of any genomic sequence assembler. Contact: daniel.fasulo@celera.com Keywords: whole-genome assembly; shotgun sequencing; polymorphism.

Keywords

IndelSequence assemblyShotgun sequencingGenomeGeneticsComputational biologyBiologyINDEL MutationSingle-nucleotide polymorphismReference genomeFragment (logic)PopulationComputer scienceAlgorithmGeneGenotype

MeSH Terms

AlgorithmsBase SequenceConsensus SequenceDNA FragmentationGene Expression ProfilingGenetic VariationMolecular Sequence DataPolymorphismGeneticPolymorphismRestriction Fragment LengthSequence AlignmentSequence AnalysisDNA

Related Publications

The Phusion Assembler

The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ∼7.5× sequence coverage,...

2002 Genome Research 220 citations

Publication Info

Year
2002
Type
article
Volume
18
Issue
suppl_1
Pages
S294-S302
Citations
23
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

23
OpenAlex
0
Influential
19
CrossRef

Cite This

Daniel Fasulo, Aaron L. Halpern, Ian Dew et al. (2002). Efficiently detecting polymorphisms during the fragment assembly process. Bioinformatics , 18 (suppl_1) , S294-S302. https://doi.org/10.1093/bioinformatics/18.suppl_1.s294

Identifiers

DOI
10.1093/bioinformatics/18.suppl_1.s294
PMID
12169559

Data Quality

Data completeness: 86%