Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

2018 Nature Genetics 1,433 citations

Abstract

In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness. SAIGE (Scalable and Accurate Implementation of GEneralized mixed model) is a generalized mixed model association test that can efficiently analyze large data sets while controlling for unbalanced case-control ratios and sample relatedness, as shown by applying SAIGE to the UK Biobank data for > 1,400 binary phenotypes.

Keywords

BiobankGenome-wide association studyType I and type II errorsSample size determinationGeneralized estimating equationScalabilityGeneralized linear mixed modelAssociation testStatisticsGenetic associationSample (material)Computer scienceScale (ratio)Data miningBinary numberBiologyGeneticsMathematicsSingle-nucleotide polymorphism

MeSH Terms

Case-Control StudiesComputer SimulationGenome-Wide Association StudyHumansLinear ModelsLogistic ModelsModelsGeneticPhenotypePolymorphismSingle Nucleotide

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
50
Issue
9
Pages
1335-1341
Citations
1433
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1433
OpenAlex
96
Influential

Cite This

Wei Zhou, Jonas B. Nielsen, Lars G. Fritsche et al. (2018). Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nature Genetics , 50 (9) , 1335-1341. https://doi.org/10.1038/s41588-018-0184-y

Identifiers

DOI
10.1038/s41588-018-0184-y
PMID
30104761
PMCID
PMC6119127

Data Quality

Data completeness: 86%