Abstract

Abstract Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code. Availability and implementation: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. Contact: http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk or mitchell@ebi.ac.uk

Keywords

File Transfer ProtocolUnixComputer scienceJavaOperating systemScalabilitySource codeSoftwareDownloadCode (set theory)Function (biology)Programming languageDatabaseSet (abstract data type)The InternetBiology

Affiliated Institutions

Related Publications

Publication Info

Year
2014
Type
article
Volume
30
Issue
9
Pages
1236-1240
Citations
9048
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

9048
OpenAlex

Cite This

Philip Jones, David Binns, Hsin-Yu Chang et al. (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics , 30 (9) , 1236-1240. https://doi.org/10.1093/bioinformatics/btu031

Identifiers

DOI
10.1093/bioinformatics/btu031