Abstract
The use of parameter-rich substitution models in molecular phylogenetics has been criticized on the basis that these models can cause a reduction both in accuracy and in the ability to discriminate among competing topologies. We have explored the relationship between nucleotide substitution model complexity and nonparametric bootstrap support under maximum likelihood (ML) for six data sets for which the true relationships are known with a high degree of certainty. We also performed equally weighted maximum parsimony analyses in order to assess the effects of ignoring branch length information during tree selection. We observed that maximum parsimony gave the lowest mean estimate of bootstrap support for the correct set of nodes relative to the ML models for every data set except one. For several data sets, we established that the exact distribution used to model among-site rate variation was critical for a successful phylogenetic analysis. Site-specific rate models were shown to perform very poorly relative to gamma and invariable sites models for several of the data sets most likely because of the gross underestimation of branch lengths. The invariable sites model also performed poorly for several data sets where this model had a poor fit to the data, suggesting that addition of the gamma distribution can be critical. Estimates of bootstrap support for the correct nodes often increased under gamma and invariable sites models relative to equal rates models. Our observations are contrary to the prediction that such models cause reduced confidence in phylogenetic hypotheses. Our results raise several issues regarding the process of model selection, and we briefly discuss model selection uncertainty and the role of sensitivity analyses in molecular phylogenetics.
Keywords
Affiliated Institutions
Related Publications
Model Misspecification and Probabilistic Tests of Topology: Evidence from Empirical Data Sets
Probabilistic tests of topology offer a powerful means of evaluating competing phylogenetic hypotheses. The performance of the nonparametric Shimodaira-Hasegawa (SH) test, the p...
Ultrafast Approximation for Phylogenetic Bootstrap
Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data,...
UFBoot2: Improving the Ultrafast Bootstrap Approximation
The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap a...
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction metho...
New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0
PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasona...
Publication Info
- Year
- 2002
- Type
- article
- Volume
- 19
- Issue
- 4
- Pages
- 394-405
- Citations
- 105
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/oxfordjournals.molbev.a004094