Abstract

With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

Keywords

False positive paradoxFalse discovery rateFalse positives and false negativesFalse positive rateStatistical hypothesis testingMultiple comparisons problemGenomeComputer scienceNull hypothesisSet (abstract data type)Linkage (software)Feature (linguistics)Computational biologyMeasure (data warehouse)Data miningStatisticsBiologyGeneticsArtificial intelligenceMathematicsGene

Affiliated Institutions

Related Publications

A Direct Approach to False Discovery Rates

Summary Multiple-hypothesis testing involves guarding against much more complicated errors than single-hypothesis testing. Whereas we typically control the type I error rate for...

2002 Journal of the Royal Statistical Soci... 5607 citations

Publication Info

Year
2003
Type
article
Volume
100
Issue
16
Pages
9440-9445
Citations
9812
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

9812
OpenAlex

Cite This

John D. Storey, Robert Tibshirani (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences , 100 (16) , 9440-9445. https://doi.org/10.1073/pnas.1530509100

Identifiers

DOI
10.1073/pnas.1530509100