Abstract
A hidden Markov model of signal peptides has been developed. It contains submodels for the N-terminal part, the hydrophobic region, and the region around the cleavage site. For known signal peptides, the model can be used to assign objective boundaries between these three regions. Applied to our data, the length distributions for the three regions are significantly different from expectations. For instance, the assigned hydrophobic region is between 8 and 12 residues long in almost all eukaryotic signal peptides. This analysis also makes obvious the difference between eukaryotes, Gram-positive bacteria, and Gram-negative bacteria. The model can be used to predict the location of the cleavage site, which it finds correctly in nearly 70% of signal peptides in a cross-validated test--almost the same accuracy as the best previous method. One of the problems for existing prediction methods is the poor discrimination between signal peptides and uncleaved signal anchors, but this is substantially improved by the hidden Markov model when expanding it with a very simple signal anchor model.
Keywords
Affiliated Institutions
Related Publications
GeneMark.hmm: new solutions for gene finding
The number of completely sequenced bacterial genomes has been growing fast. There are computer methods available for finding genes but yet there is a need for more accurate algo...
Enumeration of amino acid fermenting bacteria in the human large intestine: effects of pH and starch on peptide metabolism and dissimilation of amino acids
Proteins and trichloroacetic acid-soluble peptides were present in high concentrations in human intestinal contents and faeces. Free amino acids were also detected in millimolar...
TigrScan and GlimmerHMM: two open source <i>ab initio</i> eukaryotic gene-finders
Abstract Summary: We describe two new Generalized Hidden Markov Model implementations for ab initio eukaryotic gene prediction. The C/C++ source code for both is available as op...
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions
Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence ...
Global optimization of a neural network-hidden Markov model hybrid
An original method for integrating artificial neural networks (ANN) with hidden Markov models (HMM) is proposed. ANNs are suitable for performing phonetic classification, wherea...
Publication Info
- Year
- 1998
- Type
- article
- Volume
- 6
- Pages
- 122-30
- Citations
- 528
- Access
- Closed