Abstract

Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general ‘‘phase change’ ’ phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like F ST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.

Keywords

StatisticDivergence (linguistics)PopulationPrincipal component analysisPopulation structureBiologyStatistical hypothesis testingStatisticsPopulation sizePopulation geneticsGenetic structureComputer scienceEvolutionary biologyArtificial intelligenceGenetic variationMathematicsDemography

Affiliated Institutions

Related Publications

Publication Info

Year
2006
Type
article
Volume
2
Issue
12
Pages
e190-e190
Citations
5408
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

5408
OpenAlex

Cite This

Nick Patterson, Alkes L. Price, David Reich (2006). Population Structure and Eigenanalysis. PLoS Genetics , 2 (12) , e190-e190. https://doi.org/10.1371/journal.pgen.0020190

Identifiers

DOI
10.1371/journal.pgen.0020190