A new generation of homology search tools based on probabilistic inference.

Abstract

Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.

Keywords

Probabilistic logicInferenceComputer scienceHidden Markov modelStatistical modelMachine learningTheoretical computer scienceArtificial intelligenceAlgorithm

Affiliated Institutions

Related Publications

Protein homology detection by HMM–HMM comparison

Johannes Söding

Abstract Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. Results: We have gene...

2004 Bioinformatics 2470 citations

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Stephen F. Altschul

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and s...

1997 Nucleic Acids Research 73388 citations

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li , Richard Durbin

Abstract Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A...

2009 Bioinformatics 59569 citations

TigrScan and GlimmerHMM: two open source <i>ab initio</i> eukaryotic gene-finders

William H. Majoros , Mihaela Pertea , Steven L. Salzberg

Abstract Summary: We describe two new Generalized Hidden Markov Model implementations for ab initio eukaryotic gene prediction. The C/C++ source code for both is available as op...

2004 Bioinformatics 1892 citations

Automated generation of heuristics for biological sequence comparison

Guy Slater , Ewan Birney

Abstract Background Exhaustive methods of sequence alignment are accurate but slow, whereas heuristic approaches run quickly, but their complexity makes them more difficult to i...

2005 BMC Bioinformatics 2984 citations

Publication Info

Year: 2009
Type: article
Volume: 23
Issue: 1
Pages: 205-11
Citations: 1109
Access: Closed

External Links

Citation Metrics

1109

OpenAlex

Cite This

APA Style

                            
                                    Sean R. Eddy
                                
                            (2009). 
                            A new generation of homology search tools based on probabilistic inference.. 
                            PubMed
                            , 23
                            (1)
                            , 205-11.