Abstract
As models of sequence evolution become more and more complicated, many criteria for model selection have been proposed, and tools are available to select the best model for an alignment under a particular criterion. However, in many instances the selected model fails to explain the data adequately as reflected by large deviations between observed pattern frequencies and the corresponding expectation. We present MISFITS, an approach to evaluate the goodness of fit (http://www.cibiv.at/software/misfits). MISFITS introduces a minimum number of "extra substitutions" on the inferred tree to provide a biologically motivated explanation why the alignment may deviate from expectation. These extra substitutions plus the evolutionary model then fully explain the alignment. We illustrate the method on several examples and then give a survey about the goodness of fit of the selected models to the alignments in the PANDIT database.
Keywords
Affiliated Institutions
Related Publications
PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees
PANDIT is a database of homologous sequence alignments accompanied by estimates of their corresponding phylogenetic trees. It provides a valuable resource to those studying phyl...
Selecting the Best-Fit Model of Nucleotide Substitution
Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the...
Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree
Abstract Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super‐gene alignment, which i...
Goodness-of-Fit Testing for Latent Class Models
Latent class models with sparse contingency tables can present problems for model comparison and selection, because under these conditions the distributions of goodness-of-fit i...
Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences
Researchers routinely estimate distances between molecular sequences using continuous-time Markov chain models. We present a new method, robust counting, that protects against t...
Publication Info
- Year
- 2010
- Type
- article
- Volume
- 28
- Issue
- 1
- Pages
- 143-152
- Citations
- 17
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/molbev/msq180