Abstract
Non-biological experimental variation or "batch effects" are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes ( > 25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.
Keywords
Affiliated Institutions
Related Publications
Sample sizes for saturation in qualitative research: A systematic review of empirical tests
To review empirical studies that assess saturation in qualitative research in order to identify sample sizes for saturation, strategies used to assess saturation, and guidance w...
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior od...
Empirical Bayes Analysis of a Microarray Experiment
AbstractMicroarrays are a novel technology that facilitates the simultaneous measurement of thousands of gene expression levels. A typical microarray experiment can produce mill...
Effect size, confidence interval and statistical significance: a practical guide for biologists
Abstract Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly,...
Power and sample size calculations for Mendelian randomization studies using one genetic instrument
Mendelian randomization, which is instrumental variable analysis using genetic variants as instruments, is an increasingly popular method of making causal inferences from observ...
Publication Info
- Year
- 2006
- Type
- article
- Volume
- 8
- Issue
- 1
- Pages
- 118-127
- Citations
- 8511
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/biostatistics/kxj037