Abstract
Abstract Motivation A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Availability and implementation Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo Supplementary information Supplementary data are available at Bioinformatics online.
Keywords
Affiliated Institutions
Related Publications
DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions
Abstract Annotation of protein functions plays an important role in understanding life at the molecular level. High‐throughput sequencing produces massive numbers of raw protein...
Large-Scale Protein Annotation through Gene Ontology
Recent progress in genomic sequencing, computational biology, and ontology development has presented an opportunity to investigate biological systems from a unique perspective, ...
Functional evaluation of domain–domain interactions and human protein interaction networks
Abstract Motivation: Large amounts of protein and domain interaction data are being produced by experimental high-throughput techniques and computational approaches. To gain ins...
FunSimMat: a comprehensive functional similarity database
Functional similarity based on Gene Ontology (GO) annotation is used in diverse applications like gene clustering, gene expression data analysis, protein interaction prediction ...
UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB
Abstract Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. ...
Publication Info
- Year
- 2017
- Type
- article
- Volume
- 34
- Issue
- 4
- Pages
- 660-668
- Citations
- 533
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btx624