Abstract

Abstract Motivation: All residues in a protein are not equally important. Some are essential for the proper structure and function of the protein, whereas others can be readily replaced. Conservation analysis is one of the most widely used methods for predicting these functionally important residues in protein sequences. Results: We introduce an information-theoretic approach for estimating sequence conservation based on Jensen–Shannon divergence. We also develop a general heuristic that considers the estimated conservation of sequentially neighboring sites. In large-scale testing, we demonstrate that our combined approach outperforms previous conservation-based measures in identifying functionally important residues; in particular, it is significantly better than the commonly used Shannon entropy measure. We find that considering conservation at sequential neighbors improves the performance of all methods tested. Our analysis also reveals that many existing methods that attempt to incorporate the relationships between amino acids do not lead to better identification of functionally important sites. Finally, we find that while conservation is highly predictive in identifying catalytic sites and residues near bound ligands, it is much less effective in identifying residues in protein–protein interfaces. Availability: Data sets and code for all conservation measures evaluated are available at http://compbio.cs.princeton.edu/conservation/ Contact: mona@cs.princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

HeuristicConserved sequenceComputer scienceEntropy (arrow of time)Sequence (biology)Function (biology)Data miningComputational biologyMathematicsBiologyPeptide sequenceArtificial intelligenceGeneticsGene

Affiliated Institutions

Related Publications

Publication Info

Year
2007
Type
article
Volume
23
Issue
15
Pages
1875-1882
Citations
712
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

712
OpenAlex

Cite This

John A. Capra, Mona Singh (2007). Predicting functionally important residues from sequence conservation. Bioinformatics , 23 (15) , 1875-1882. https://doi.org/10.1093/bioinformatics/btm270

Identifiers

DOI
10.1093/bioinformatics/btm270