Abstract
The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the underlying problems are NP-hard. In practice, the transformed problems are so small that simple branch-and-bound algorithms successfully solve them, thus permitting auxiliary experimental information to be taken into account in the form of overlap, orientation, and distance constraints.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings
Weighted paths in directed grid graphs of dimension (m X n) can be used to model the string edit problem, which consists of obtaining optimal (weighted) alignments between subst...
An Algorithm for Locating Nonoverlapping Regions of Maximum Alignment Score
In this paper, we present an $O(N^2 \log ^2 )$ algorithm for finding the two nonoverlapping substrings of a given string of length N which have the highest-scoring alignment bet...
Worst-Case Analysis of a New Heuristic for the Travelling Salesman Problem
Abstract An O( n 3 ) heuristic algorithm is described for solving d -city travelling salesman problems (TSP) whose cost matrix satisfies the triangularity condition. The algorit...
An Eulerian path approach to DNA fragment assembly
For the last 20 years, fragment assembly in DNA sequencing followed the “overlap–layout–consensus” paradigm that is used in all currently available assembly tools. Although this...
A New Algorithm for DNA Sequence Assembly
Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-estab...
Publication Info
- Year
- 1995
- Type
- article
- Volume
- 2
- Issue
- 2
- Pages
- 275-290
- Citations
- 249
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1089/cmb.1995.2.275
- PMID
- 7497129