Abstract
Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences because it significantly affects the conclusions that may be drawn from genome-scale sequence analyses of eukaryotic genomes. Here we present the MisPred database and computational pipeline that provide efficient means for the identification of erroneous sequences in public databases. The MisPred database contains a collection of abnormal, incomplete and mispredicted protein sequences from 19 metazoan species identified as erroneous by MisPred quality control tools in the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, NCBI/RefSeq and EnsEMBL databases. Major releases of the database are automatically generated and updated regularly. The database (http://www.mispred.com) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. DATABASE URL: http://www.mispred.com.
Keywords
Affiliated Institutions
Related Publications
The Protein Information Resource: an integrated public resource of functional annotation of proteins
The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific disc...
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequen...
UniProt: the Universal Protein knowledgebase
To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein d...
The Universal Protein Resource (UniProt)
The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. F...
The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domains structur...
Publication Info
- Year
- 2013
- Type
- article
- Volume
- 2013
- Pages
- bat053-bat053
- Citations
- 17
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/database/bat053