A Bayesian approach to the identification of panmictic populations and the assignment of individuals

Kevin J. Dawson; Khalid Belkhir

doi:10.1017/s001667230100502x

Abstract

We present likelihood-based methods for assigning the individuals in a sample to source populations, on the basis of their genotypes at co-dominant marker loci. The source populations are assumed to be at Hardy–Weinberg and linkage equilibrium, but the allelic composition of these source populations and even the number of source populations represented in the sample are treated as uncertain. The parameter of interest is the partition of the set of sampled individuals, induced by the assignment of individuals to source populations. We present a maximum likelihood method, and then a more powerful Bayesian approach for estimating this sample partition. In general, it will not be feasible to evaluate the evidence supporting each possible partition of the sample. Furthermore, when the number of individuals in the sample is large, it may not even be feasible to evaluate the evidence supporting, individually, each of the most plausible partitions because there may be many individuals which are difficult to assign. To overcome these problems, we use low-dimensional marginals (the ‘co-assignment probabilities’) of the posterior distribution of the sample partition as measures of ‘similarity’, and then apply a hierarchical clustering algorithm to identify clusters of individuals whose assignment together is well supported by the posterior distribution. A binary tree provides a visual representation of how well the posterior distribution supports each cluster in the hierarchy. These methods are applicable to other problems where the parameter of interest is a partition of a set. Because the co-assignment probabilities are independent of the arbitrary labelling of source populations, we avoid the label-switching problem of previous Bayesian methods.

Keywords

Partition (number theory)Posterior probabilityBayesian probabilityMathematicsSample (material)Hierarchical clusteringComputer scienceStatisticsCluster analysisCombinatorics

Affiliated Institutions

Related Publications

Inference of Population Structure Under a Dirichlet Process Model

John P. Huelsenbeck , Peter Andolfatto

Abstract Inferring population structure from genetic data sampled from some number of individuals is a formidable statistical problem. One widely used approach considers the num...

2007 Genetics 293 citations

Bayesian Variable Selection in Linear Regression

Toby J. Mitchell , John J. Beauchamp

Abstract This article is concerned with the selection of subsets of predictor variables in a linear regression model for the prediction of a dependent variable. It is based on a...

1988 Journal of the American Statistical A... 1367 citations

Bayesian Analysis of Genetic Differentiation Between Populations

Jukka Corander , Patrik Waldmann , Mikko J. Sillanpää

Abstract We introduce a Bayesian method for estimating hidden population substructure using multilocus molecular markers and geographical information provided by the sampling de...

2003 Genetics 874 citations

Variable Selection via Gibbs Sampling

Edward I. George , Robert E. McCulloch

Abstract A crucial problem in building a multiple regression model is the selection of predictors to include. The main thrust of this article is to propose and develop a procedu...

1993 Journal of the American Statistical A... 2675 citations

Hierarchical Phylogenetic Models for Analyzing Multipartite Sequence Data

Marc A. Suchard , Christina M. Ramirez , Janet S. Sinsheimer +1 more

Debate exists over how to incorporate information from multipartite sequence data in phylogenetic analyses. Strict combined-data approaches argue for concatenation of all partit...

2003 Systematic Biology 153 citations

Publication Info

Year: 2001
Type: article
Volume: 78
Issue: 1
Pages: 59-77
Citations: 229
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A Bayesian approach to the identification of panmictic populations and the assignment of individuals

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

229

OpenAlex

Cite This

APA Style

                            
                                    Kevin J. Dawson, 
                                
                                    Khalid Belkhir
                                
                            (2001). 
                            A Bayesian approach to the identification of panmictic populations and the assignment of individuals. 
                            Genetics Research
                            , 78
                            (1)
                            , 59-77.
                            https://doi.org/10.1017/s001667230100502x

Identifiers

DOI: 10.1017/s001667230100502x