Abstract

Genotype imputation is commonly performed in genome-wide association studies because it greatly increases the number of markers that can be tested for association with a trait. In general, one should perform genotype imputation using the largest reference panel that is available because the number of accurately imputed variants increases with reference panel size. However, one impediment to using larger reference panels is the increased computational cost of imputation. We present a new genotype imputation method, Beagle 5.0, which greatly reduces the computational cost of imputation from large reference panels. We compare Beagle 5.0 with Beagle 4.1, Impute4, Minimac3, and Minimac4 using 1000 Genomes Project data, Haplotype Reference Consortium data, and simulated data for 10k, 100k, 1M, and 10M reference samples. All methods produce nearly identical accuracy, but Beagle 5.0 has the lowest computation time and the best scaling of computation time with increasing reference panel size. For 10k, 100k, 1M, and 10M reference samples and 1,000 phased target samples, Beagle 5.0's computation time is 3× (10k), 12× (100k), 43× (1M), and 533× (10M) faster than the fastest alternative method. Cost data from the Amazon Elastic Compute Cloud show that Beagle 5.0 can perform genome-wide imputation from 10M reference samples into 1,000 phased target samples at a cost of less than one US cent per sample.

Keywords

Computer scienceStatisticsMathematics

MeSH Terms

Computational BiologyGenomeHumanGenome-Wide Association StudyHaplotypesHumansSoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
103
Issue
3
Pages
338-348
Citations
2096
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2096
OpenAlex
1724
CrossRef

Cite This

Brian L. Browning, Ying Zhou, Sharon R. Browning (2018). A One-Penny Imputed Genome from Next-Generation Reference Panels. The American Journal of Human Genetics , 103 (3) , 338-348. https://doi.org/10.1016/j.ajhg.2018.07.015

Identifiers

DOI
10.1016/j.ajhg.2018.07.015
PMID
30100085
PMCID
PMC6128308

Data Quality

Data completeness: 90%