A graph theoretic approach to the analysis of DNA sequencing data.

1996 Genome Research 40 citations

Abstract

The analysis of data from automated DNA sequencing instruments has been a limiting factor in the development of new sequencing technology. A new base-calling algorithm that is intended to be independent of any particular sequencing technology has been developed and shown to be effective with data from the Applied Biosystems 373 sequencing system. This algorithm makes use of a nonlinear deconvolution filter to detect likely oligomer events and a graph theoretic editing strategy to find the subset of those events that is most likely to correspond to the correct sequence. Metrics evaluating the quality and accuracy of the resulting sequence are also generated and have been shown to be predictive of measured error rates. Compared to the Applied Biosystems Analysis software, this algorithm generates 18% fewer insertion errors, 80% more deletion errors, and 4% fewer mismatches. The tradeoff between different types of errors can be controlled through a secondary editing step that inserts or deletes base calls depending on their associated confidence values.

Keywords

BiologyDeconvolutionSoftwareDNA sequencingAlgorithmLimitingSequence analysisGraphComputer scienceComputational biologyData miningGeneticsDNATheoretical computer scienceEngineering

Affiliated Institutions

Related Publications

Publication Info

Year
1996
Type
article
Volume
6
Issue
2
Pages
80-91
Citations
40
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

40
OpenAlex

Cite This

Anthony Berno (1996). A graph theoretic approach to the analysis of DNA sequencing data.. Genome Research , 6 (2) , 80-91. https://doi.org/10.1101/gr.6.2.80

Identifiers

DOI
10.1101/gr.6.2.80