Abstract
Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.
Keywords
Affiliated Institutions
Related Publications
Protein homology detection by HMM–HMM comparison
Abstract Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. Results: We have gene...
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and s...
Fast and accurate short read alignment with Burrows–Wheeler transform
Abstract Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A...
TigrScan and GlimmerHMM: two open source <i>ab initio</i> eukaryotic gene-finders
Abstract Summary: We describe two new Generalized Hidden Markov Model implementations for ab initio eukaryotic gene prediction. The C/C++ source code for both is available as op...
Automated generation of heuristics for biological sequence comparison
Abstract Background Exhaustive methods of sequence alignment are accurate but slow, whereas heuristic approaches run quickly, but their complexity makes them more difficult to i...
Publication Info
- Year
- 2009
- Type
- article
- Volume
- 23
- Issue
- 1
- Pages
- 205-11
- Citations
- 1109
- Access
- Closed