Abstract
Abstract The database of known protein three‐dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology‐derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicity. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three‐dimensional detail by homology.
Keywords
Affiliated Institutions
Related Publications
Touring protein fold space with Dali/FSSP
The FSSP database and its new supplement, the Dali Domain Dictionary, present a continuously updated classification of all known 3D protein structures. The classification is der...
Fast protein fold recognition via sequence to structure alignment and contact capacity potentials.
We propose new empirical scoring potentials and associated alignment procedures for optimally aligning protein sequences to protein structures. The method has two main applicati...
Profile analysis: detection of distantly related proteins.
Profile analysis is a method for detecting distantly related proteins by sequence comparison. The basis for comparison is not only the customary Dayhoff mutational-distance matr...
T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension
This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most p...
Protein Structure Prediction and Structural Genomics
Genome sequencing projects are producing linear amino acid sequences, but full understanding of the biological role of these proteins will require knowledge of their structure a...
Publication Info
- Year
- 1991
- Type
- article
- Volume
- 9
- Issue
- 1
- Pages
- 56-68
- Citations
- 1639
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1002/prot.340090107