Probability-based protein identification by searching sequence databases using mass spectrometry data

Abstract

Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.

Keywords

MascotComputer scienceFalse positive paradoxDatabase search engineTandem mass spectrometryMass spectrometryData miningIdentification (biology)Context (archaeology)Sequence databaseTandem mass tagProtein sequencingSequence (biology)Peptide sequenceSearch engineProteomicsChemistryArtificial intelligenceInformation retrievalQuantitative proteomicsChromatographyBiology

Affiliated Institutions

Related Publications

The PRIDE database and related tools and resources in 2019: improving support for quantification data

Yasset Pérez‐Riverol , Attila Csordás , Jingwen Bai +20 more

This FAIRsharing record describes: The PRIDE PRoteomics IDEntifications (PRIDE) Archive database is a centralized, standards compliant, public data repository for mass spectrome...

2018 Nucleic Acids Research 7132 citations

The Pfam protein families database

ROBERT FINN , John Tate , Jaina Mistry +8 more

Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam ...

2007 Nucleic Acids Research 1831 citations

The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data

C. T. Porter

The Catalytic Site Atlas (CSA) provides catalytic residue annotation for enzymes in the Protein Data Bank. It is available online at http://www.ebi.ac.uk/thornton-srv/databases/...

2003 Nucleic Acids Research 608 citations

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences

Yasset Pérez‐Riverol , Jingwen Bai , Chakradhar Bandla +11 more

Abstract The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE ...

2021 Nucleic Acids Research 6330 citations

MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis

Jasmine Chong , Othman Soufan , Carin Li +5 more

We present a new update to MetaboAnalyst (version 4.0) for comprehensive metabolomic data analysis, interpretation, and integration with other omics data. Since the last major u...

2018 Nucleic Acids Research 3661 citations

Publication Info

Year: 1999
Type: article
Volume: 20
Issue: 18
Pages: 3551-3567
Citations: 8216
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Probability-based protein identification by searching sequence databases using mass spectrometry data

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

8216

OpenAlex

Cite This

APA Style

                            
                                
                                    David N. Perkins, 
                                
                                    Darryl Pappin, 
                                
                                    David M. Creasy
                                
                                et al.
                            
                            (1999). 
                            Probability-based protein identification by searching sequence databases using mass spectrometry data. 
                            Electrophoresis
                            , 20
                            (18)
                            , 3551-3567.
                            https://doi.org/10.1002/(sici)1522-2683(19991201)20:18<3551::aid-elps3551>3.0.co;2-2
                        

Identifiers

DOI: 10.1002/(sici)1522-2683(19991201)20:18<3551::aid-elps3551>3.0.co;2-2