Abstract
Abstract Odds ratios or other effect sizes estimated from genome scans are upwardly biased, because only the top‐ranking associations are reported, and moreover only if they reach a defined level of significance. No unbiased estimate exists based on data selected in this fashion, but replication studies are routinely performed that allow unbiased estimation of the effect sizes. Estimation based on replication data alone is inefficient in the sense that the initial scan could, in principle, contribute information on the effect size. We propose an unbiased estimator combining information from both the initial scan and the replication study, which is more efficient than that based just on the replication. Specifically, we adjust the standard combined estimate to allow for selection by rank and significance in the initial scan. Our approach explicitly allows for multiple associations arising from a scan, and is robust to mis‐specification of a significance threshold. We require replication data to be available but argue that, in most applications, estimates of effect sizes are only useful when associations have been replicated. We illustrate our approach on some recently completed scans and explore its efficiency by simulation. Genet. Epidemiol . 33:406–418, 2009. © 2009 Wiley‐Liss, Inc.
Keywords
Affiliated Institutions
Related Publications
Statistical Significance Versus Clinical Importance of Observed Effect Sizes: What Do P Values and Confidence Intervals Really Represent?
Effect size measures are used to quantify treatment effects or associations between variables. Such measures, of which >70 have been described in the literature, include unst...
Beyond significance testing: Reforming data analysis methods in behavioral research.
Practices of data analysis in psychology and related disciplines are changing. This is evident in the longstanding controversy about statistical tests in the behavioral sciences...
Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
Abstract Motivation In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical an...
Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation
Abstract We construct a prediction rule on the basis of some data, and then wish to estimate the error rate of this rule in classifying future observations. Cross-validation pro...
Why Most Discovered True Associations Are Inflated
Newly discovered true (non-null) associations often have inflated effects compared with the true effect sizes. I discuss here the main reasons for this inflation. First, theoret...
Publication Info
- Year
- 2009
- Type
- article
- Volume
- 33
- Issue
- 5
- Pages
- 406-418
- Citations
- 65
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1002/gepi.20394