Abstract
This paper gives an expectation maximization (EM) algorithm to obtain allele frequencies, haplotype frequencies, and gametic disequilibrium coefficients for multiple-locus systems. It permits high polymorphism and null alleles at all loci. This approach effectively deals with the primary estimation problems associated with such systems; that is, there is not a one-to-one correspondence between phenotypic and genotypic categories, and sample sizes tend to be much smaller than the number of phenotypic categories. The EM method provides maximum-likelihood estimates and therefore allows hypothesis tests using likelihood ratio statistics that have chi 2 distributions with large sample sizes. We also suggest a data resampling approach to estimate test statistic sampling distributions. The resampling approach is more computer intensive, but it is applicable to all sample sizes. A strategy to test hypotheses about aggregate groups of gametic disequilibrium coefficients is recommended. This strategy minimizes the number of necessary hypothesis tests while at the same time describing the structure of disequilibrium. These methods are applied to three unlinked dinucleotide repeat loci in Navajo Indians and to three linked HLA loci in Gila River (Pima) Indians. The likelihood functions of both data sets are shown to be maximized by the EM estimates, and the testing strategy provides a useful description of the structure of gametic disequilibrium. Following these applications, a number of simulation experiments are performed to test how well the likelihood-ratio statistic distributions are approximated by chi 2 distributions. In most circumstances the chi 2 grossly underestimated the probability of type I errors. However, at times they also overestimated the type 1 error probability. Accordingly, we recommended hypothesis tests that use the resampling method.
Keywords
Affiliated Institutions
Related Publications
Testing for linkage disequilibrium in genotypic data using the Expectation-Maximization algorithm
We generalize an approach suggested by Hill (Heredity, 33, 229-239, 1974) for testing for significant association among alleles at two loci when only genotype and not haplotype ...
Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses
In this paper, we develop a classical approach to model selection. Using the Kullback-Leibler Information Criterion to measure the closeness of a model to the truth, we propose ...
A Modified Likelihood Ratio Test for Homogeneity in Finite Mixture Models
Summary Testing for homogeneity in finite mixture models has been investigated by many researchers. The asymptotic null distribution of the likelihood ratio test (LRT) is very c...
Extensive Genome-wide Linkage Disequilibrium in Cattle
A genome-wide linkage disequilibrium (LD) map was generated using microsatellite genotypes (284 autosomal microsatellite loci) of 581 gametes sampled from the dutch black-and-wh...
Approximate Likelihood-Ratio Test for Branches: A Fast, Accurate, and Powerful Alternative
We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihood-ratio test (aLRT) for branches is presente...
Publication Info
- Year
- 1995
- Type
- article
- Volume
- 56
- Issue
- 3
- Pages
- 799-810
- Citations
- 533
- Access
- Closed