Toward Simplifying and Accurately Formulating Fragment Assembly

1995 Journal of Computational Biology 249 citations

Abstract

The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the underlying problems are NP-hard. In practice, the transformed problems are so small that simple branch-and-bound algorithms successfully solve them, thus permitting auxiliary experimental information to be taken into account in the form of overlap, orientation, and distance constraints.

Keywords

SubstringFragment (logic)MathematicsGraphReduction (mathematics)AlgorithmString (physics)Computer scienceMathematical optimizationCombinatoricsData structure

MeSH Terms

Base SequenceDNAMathematicsModelsStatisticalOligodeoxyribonucleotidesProbabilityReproducibility of Results

Affiliated Institutions

Related Publications

Publication Info

Year
1995
Type
article
Volume
2
Issue
2
Pages
275-290
Citations
249
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

249
OpenAlex
19
Influential
171
CrossRef

Cite This

Eugene W. Myers (1995). Toward Simplifying and Accurately Formulating Fragment Assembly. Journal of Computational Biology , 2 (2) , 275-290. https://doi.org/10.1089/cmb.1995.2.275

Identifiers

DOI
10.1089/cmb.1995.2.275
PMID
7497129

Data Quality

Data completeness: 86%