Abstract
We studied the importance of proper model assumption in the context of Bayesian phylogenetics by examining >5,000 Bayesian analyses and six nested models of nucleotide substitution. Model misspecification can strongly bias bipartition posterior probability estimates. These biases were most pronounced when rate heterogeneity was ignored. The type of bias seen at a particular bipartition appeared to be strongly influenced by the lengths of the branches surrounding that bipartition. In the Felsenstein zone, posterior probability estimates of bipartitions were biased when the assumed model was underparameterized but were unbiased when the assumed model was overparameterized. For the inverse Felsenstein zone, however, both underparameterization and overparameterization led to biased bipartition posterior probabilities, although the bias caused by overparameterization was less pronounced and disappeared with increased sequence length. Model parameter estimates were also affected by model misspecification. Underparameterization caused a bias in some parameter estimates, such as branch lengths and the gamma shape parameter, whereas overparameterization caused a decrease in the precision of some parameter estimates. We caution researchers to assure that the most appropriate model is assumed by employing both a priori model choice methods and a posteriori model adequacy tests.
Keywords
Related Publications
Robustness of partial least-squares method for estimating latent variable quality structures
Latent variable structural models and the partial least-squares (PLS) estimation procedure have found increased interest since being used in the context of customer satisfaction...
Modelling multiple sources of dissemination bias in meta‐analysis
Abstract Asymmetry in the funnel plot for a meta‐analysis suggests the presence of dissemination bias. This may be caused by publication bias through the decisions of journal ed...
Comparative Performance of Bayesian and AIC-Based Measures of Phylogenetic Model Uncertainty
Reversible-jump Markov chain Monte Carlo (RJ-MCMC) is a technique for simultaneously evaluating multiple related (but not necessarily nested) statistical models that has recentl...
Wald Lecture: On the Bernstein-von Mises theorem with infinite-dimensional parameters
If there are many independent, identically distributed\nobservations governed by a smooth, finite-dimensional statistical model, the\nBayes estimate and the maximum likelihood e...
Hierarchical Phylogenetic Models for Analyzing Multipartite Sequence Data
Debate exists over how to incorporate information from multipartite sequence data in phylogenetic analyses. Strict combined-data approaches argue for concatenation of all partit...
Publication Info
- Year
- 2004
- Type
- article
- Volume
- 53
- Issue
- 2
- Pages
- 265-277
- Citations
- 339
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/10635150490423520