Abstract

Abstract The database of known protein three‐dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology‐derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicity. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three‐dimensional detail by homology.

Keywords

Structural alignmentLoop modelingProtein structure databaseHomology (biology)Structural Classification of Proteins databaseDatabaseSequence alignmentHomology modelingThreading (protein sequence)Sequence databaseStructural similarityProtein structureProtein secondary structureSequence (biology)Similarity (geometry)Computational biologyBiologyPeptide sequenceProtein structure predictionGeneticsComputer scienceArtificial intelligenceAmino acidGene

Affiliated Institutions

Related Publications

Touring protein fold space with Dali/FSSP

The FSSP database and its new supplement, the Dali Domain Dictionary, present a continuously updated classification of all known 3D protein structures. The classification is der...

1998 Nucleic Acids Research 667 citations

Publication Info

Year
1991
Type
article
Volume
9
Issue
1
Pages
56-68
Citations
1639
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1639
OpenAlex

Cite This

Chris Sander, Reinhard Schneider (1991). Database of homology‐derived protein structures and the structural meaning of sequence alignment. Proteins Structure Function and Bioinformatics , 9 (1) , 56-68. https://doi.org/10.1002/prot.340090107

Identifiers

DOI
10.1002/prot.340090107