Abstract
The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes.
Keywords
Affiliated Institutions
Related Publications
NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees
Large scale gene duplication is a major force driving the evolution of genetic functional innovation. Whole genome duplications are widely believed to have played an important r...
Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation
Big, time-scaled phylogenies are fundamental to connecting evolutionary processes to modern biodiversity patterns. Yet inferring reliable phylogenetic trees for thousands of spe...
PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees
The data and tools in PANTHER-a comprehensive, curated database of protein families, trees, subfamilies and functions available at http://pantherdb.org-have undergone continual,...
One thousand plant transcriptomes and the phylogenomics of green plants
Green plants (Viridiplantae) include around 450,000–500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the On...
Nested clade analyses of phylogeographic data: testing hypotheses about gene flow and population history
Since the 1920s, population geneticists have had measures that describe how genetic variation is distributed spatially within a species’ geographical range. Modern genetic surve...
Publication Info
- Year
- 2017
- Type
- article
- Volume
- 34
- Issue
- 12
- Pages
- 3267-3278
- Citations
- 309
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/molbev/msx259