Abstract

The functional classification of genes on a genome-wide scale is now in its infancy, and we make a first attempt to assess existing methods and identify sources of error. To this end, we compared two independent efforts for associating proteins with functions, one implemented by FlyBase and the other by PANTHER at Celera Genomics. Both methods make inferences based on sequence similarity and the available experimental evidence. However, they differ considerably in methodology and process. Overall, assuming that the systematic error across the two methods is relatively small, we find the protein-to-function association error rate of both the FlyBase and PANTHER methods to be <2%. The primary source of error for both methods appears to be simple human error. Although homology-based inference can certainly cause errors in annotation, our analysis indicates that the frequency of such errors is relatively small compared with the number of correct inferences. Moreover, these homology errors can be minimized by careful tree-based inference, such as that implemented in PANTHER. Often, functional associations are made by one method and not the other, indicating that one of the greatest challenges lies in improving the completeness of available ontology associations.

Keywords

InferenceBiologyComputational biologyAnnotationGenomicsDrosophila melanogasterGenomeComputer scienceGeneticsArtificial intelligenceGene

Affiliated Institutions

Related Publications

Publication Info

Year
2003
Type
article
Volume
13
Issue
9
Pages
2118-2128
Citations
46
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

46
OpenAlex

Cite This

Huaiyu Mi, Jody Vandergriff, Michael J. Campbell et al. (2003). Assessment of Genome-Wide Protein Function Classification for <i>Drosophila melanogaster</i>. Genome Research , 13 (9) , 2118-2128. https://doi.org/10.1101/gr.771603

Identifiers

DOI
10.1101/gr.771603