Abstract

A heuristic algorithm for associating Gene Ontology (GO) defined molecular functions to protein domains as listed in the ProDom and CDD databases is described. The algorithm generates rules for function-domain associations based on the intersection of functions assigned to gene products by the GO consortium that contain ProDom and/or CDD domains at varying levels of sequence similarity. The hierarchical nature of GO molecular functions is incorporated into rule generation. Manual review of a subset of the rules generated indicates an accuracy rate of 87% for ProDom rules and 84% for CDD rules. The utility of these associations is that novel sequences can be assigned a putative function if sufficient similarity exists to a ProDom or CDD domain for which one or more GO functions has been associated. Although functional assignments are increasingly being made for gene products from model organisms, it is likely that the needs of investigators will continue to outpace the efforts of curators, particularly for nonmodel organisms. A comparison with other methods in terms of coverage and agreement was performed, indicating the utility of the approach. The domain-function associations and function assignments are available from our website http://www.cbil.upenn.edu/GO .

Keywords

Gene ontologyDomain (mathematical analysis)Function (biology)HeuristicIntersection (aeronautics)BiologyOntologySimilarity (geometry)Computational biologySequence (biology)Protein domainGeneData miningComputer scienceGeneticsArtificial intelligenceMathematicsEngineering

Affiliated Institutions

Related Publications

Publication Info

Year
2002
Type
article
Volume
12
Issue
4
Pages
648-655
Citations
94
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

94
OpenAlex

Cite This

Jonathan Schug, Sharon Diskin, Joan M. Mazzarelli et al. (2002). Predicting Gene Ontology Functions from ProDom and CDD Protein Domains. Genome Research , 12 (4) , 648-655. https://doi.org/10.1101/gr.222902

Identifiers

DOI
10.1101/gr.222902