Abstract

We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance.

Keywords

CutoffSequence (biology)AlgorithmMathematicsSet (abstract data type)Bayesian probabilityIterative methodComputer scienceSequence alignmentLogarithmDirichlet distributionMultiple sequence alignmentPattern recognition (psychology)CombinatoricsStatisticsBiologyArtificial intelligencePeptide sequenceGeneticsPhysics

Affiliated Institutions

Related Publications

Estimation in the Truncated Normal Distribution

Abstract Charts are presented which can be used to simplify estimation of μ and σ in the case of sampling from a singly truncated normal distribution when (a) the point of trunc...

1952 Journal of the American Statistical A... 45 citations

Publication Info

Year
1994
Type
article
Volume
91
Issue
25
Pages
12091-12095
Citations
286
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

286
OpenAlex

Cite This

Roman L. Tatusov, Stephen F. Altschul, Eugene V. Koonin (1994). Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.. Proceedings of the National Academy of Sciences , 91 (25) , 12091-12095. https://doi.org/10.1073/pnas.91.25.12091

Identifiers

DOI
10.1073/pnas.91.25.12091