Abstract
Kingman's coalescent process opens the door for estimation of population genetics model parameters from molecular sequences. One paramount parameter of interest is the effective population size. Temporal variation of this quantity characterizes the demographic history of a population. Because researchers are rarely able to choose a priori a deterministic model describing effective population size dynamics for data at hand, nonparametric curve-fitting methods based on multiple change-point (MCP) models have been developed. We propose an alternative to change-point modeling that exploits Gaussian Markov random fields to achieve temporal smoothing of the effective population size in a Bayesian framework. The main advantage of our approach is that, in contrast to MCP models, the explicit temporal smoothing does not require strong prior decisions. To approximate the posterior distribution of the population dynamics, we use efficient, fast mixing Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. In a simulation study, we demonstrate that the proposed temporal smoothing method, named Bayesian skyride, successfully recovers "true" population size trajectories in all simulation scenarios and competes well with the MCP approaches without evoking strong prior assumptions. We apply our Bayesian skyride method to 2 real data sets. We analyze sequences of hepatitis C virus contemporaneously sampled in Egypt, reproducing all key known aspects of the viral population dynamics. Next, we estimate the demographic histories of human influenza A hemagglutinin sequences, serially sampled throughout 3 flu seasons.
Keywords
Affiliated Institutions
Related Publications
Bayesian Coalescent Inference of Past Population Dynamics from Molecular Sequences
We introduce the Bayesian skyline plot, a new method for estimating past population dynamics through time from a sample of molecular sequences without dependence on a prespecifi...
Improving Bayesian Population Dynamics Inference: A Coalescent-Based Model for Multiple Loci
Effective population size is fundamental in population genetics and characterizes genetic diversity. To infer past population dynamics from molecular sequence data, coalescent-b...
MCMC Methods for Multi-Response Generalized Linear Mixed Models: The<b>MCMCglmm</b><i>R</i>Package
Generalized linear mixed models provide a flexible framework for modeling a range of data, although with non-Gaussian response variables the likelihood cannot be obtained in clo...
Estimating Mutation Parameters, Population History and Genealogy Simultaneously From Temporally Spaced Sequence Data
Abstract Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly a...
Bayesian Inference of Species Trees from Multilocus Data
Until recently, it has been common practice for a phylogenetic analysis to use a single gene sequence from a single individual organism as a proxy for an entire species. With te...
Publication Info
- Year
- 2008
- Type
- article
- Volume
- 25
- Issue
- 7
- Pages
- 1459-1471
- Citations
- 735
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/molbev/msn090