Abstract
In order to have confidence in model-based phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several model-selection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihood-ratio test (hLRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), and decision theory (DT), but their performance relative to empirical data has not been investigated thoroughly. In this study, we use 250 phylogenetic data sets obtained from TreeBASE to examine the effects that choice in model selection has on ML estimation of phylogeny, with an emphasis on optimal topology, bootstrap support, and hypothesis testing. We show that the use of different methods leads to the selection of two or more models for approximately 80% of the data sets and that the AIC typically selects more complex models than alternative approaches. Although ML estimation with different best-fit models results in incongruent tree topologies approximately 50% of the time, these differences are primarily attributable to alternative resolutions of poorly supported nodes. Furthermore, topologies and bootstrap values estimated with ML using alternative statistically supported models are more similar to each other than to topologies and bootstrap values estimated with ML under the Kimura two-parameter (K2P) model or maximum parsimony (MP). In addition, Swofford-Olsen-Waddell-Hillis (SOWH) tests indicate that ML trees estimated with alternative best-fit models are usually not significantly different from each other when evaluated with the same model. However, ML trees estimated with statistically supported models are often significantly suboptimal to ML trees made with the K2P model when both are evaluated with K2P, indicating that not all models perform in an equivalent manner. Nevertheless, the use of alternative statistically supported models generally does not affect tests of monophyletic relationships under either the Shimodaira-Hasegawa (S-H) or SOWH methods. Our results suggest that although choice in model selection has a strong impact on optimal tree topology, it rarely affects evolutionary inferences drawn from the data because differences are mainly confined to poorly supported nodes. Moreover, since ML with alternative best-fit models tends to produce more similar estimates of phylogeny than ML under the K2P model or MP, the use of any statistically based model-selection method is vastly preferable to forgoing the model-selection process altogether.
Keywords
Affiliated Institutions
Related Publications
The Effects of Nucleotide Substitution Model Assumptions on Estimates of Nonparametric Bootstrap Support
The use of parameter-rich substitution models in molecular phylogenetics has been criticized on the basis that these models can cause a reduction both in accuracy and in the abi...
Ultrafast Approximation for Phylogenetic Bootstrap
Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data,...
Accounting for Uncertainty in the Tree Topology Has Little Effect on the Decision-Theoretic Approach to Model Selection in Phylogeny Estimation
Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigor...
Empirical Problems of the Hierarchical Likelihood Ratio Test for Model Selection
Advocates of maximum likelihood (ML) approaches to phylogenetics commonly cite as one of their primary advantages the use of objective statistical criteria for model selection. ...
RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference
Abstract Motivation Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the op...
Publication Info
- Year
- 2008
- Type
- article
- Volume
- 57
- Issue
- 1
- Pages
- 76-85
- Citations
- 144
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/10635150801898920