Abstract
Genotype imputation is commonly performed in genome-wide association studies because it greatly increases the number of markers that can be tested for association with a trait. In general, one should perform genotype imputation using the largest reference panel that is available because the number of accurately imputed variants increases with reference panel size. However, one impediment to using larger reference panels is the increased computational cost of imputation. We present a new genotype imputation method, Beagle 5.0, which greatly reduces the computational cost of imputation from large reference panels. We compare Beagle 5.0 with Beagle 4.1, Impute4, Minimac3, and Minimac4 using 1000 Genomes Project data, Haplotype Reference Consortium data, and simulated data for 10k, 100k, 1M, and 10M reference samples. All methods produce nearly identical accuracy, but Beagle 5.0 has the lowest computation time and the best scaling of computation time with increasing reference panel size. For 10k, 100k, 1M, and 10M reference samples and 1,000 phased target samples, Beagle 5.0's computation time is 3× (10k), 12× (100k), 43× (1M), and 533× (10M) faster than the fastest alternative method. Cost data from the Amazon Elastic Compute Cloud show that Beagle 5.0 can perform genome-wide imputation from 10M reference samples into 1,000 phased target samples at a cost of less than one US cent per sample.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
The UK Biobank resource with deep phenotyping and genomic data
Abstract The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom...
The Advent of Internet Surveys for Political Research: A Comparison of Telephone and Internet Samples
The Internet offers a number of advantages as a survey mode: low marginal cost per completed response, capabilities for providing respondents with large quantities of informatio...
Discerning the Ancestry of European Americans in Genetic Association Studies
European Americans are often treated as a homogeneous group, but in fact form a structured population due to historical immigration of diverse source populations. Discerning the...
<scp>gimlet</scp>: a computer program for analysing genetic individual identification data
Abstract Growing interest in microsatellite genotyping, combined with noninvasive genetic sampling has led to the increased production of data. New tools to analyse these data a...
De novo assembly of human genomes with massively parallel short read sequencing
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length...
Publication Info
- Year
- 2018
- Type
- article
- Volume
- 103
- Issue
- 3
- Pages
- 338-348
- Citations
- 2096
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1016/j.ajhg.2018.07.015
- PMID
- 30100085
- PMCID
- PMC6128308