A simple algorithm to infer gene duplication and speciation events on a gene tree

Abstract

Abstract Motivation: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein’s function than paralogous sequences (that diverged by gene duplication), because duplication enables functional diversification. The utility of phylogenetic information in high-throughput genome annotation (‘phylogenomics’) is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our goal is to automate phylogenomics using explicit phylogenetic inference. A necessary component is an algorithm to infer speciation and duplication events in a given gene tree. Results: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(\batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $n^{2}$ \end{document}) which is inferior to two previous algorithms that are \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} ${\sim}$ \end{document}O(\batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $n$ \end{document}) for a gene tree of \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} $n$ \end{document}sequences. However, our algorithm is extremely simple, and its asymptotic worst case behavior is only realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene trees. Availability: http://www.genetics.wustl.edu/eddy/forester Contact: zmasek@genetics.wustl.edu; eddy@genetics.wustl.edu

Keywords

Gene duplicationSimple (philosophy)Genetic algorithmAlgorithmTree (set theory)Computer scienceGeneComputational biologyBiologyGeneticsMathematicsCombinatorics

Affiliated Institutions

Washington University in St. Louis US

Related Publications

CALIBRATING DIVERGENCE TIMES ON SPECIES TREES VERSUS GENE TREES: IMPLICATIONS FOR SPECIATION HISTORY OF APHELOCOMA JAYS

John E. McCormack , Joseph Heled , Kathleen Semple Delaney +2 more

Estimates of the timing of divergence are central to testing the underlying causes of speciation. Relaxed molecular clocks and fossil calibration have improved these estimates; ...

2010 Evolution 259 citations

Unifying Vertical and Nonvertical Evolution: A Stochastic ARG-based Framework

Erik Bloomquist , Marc A. Suchard

Evolutionary biologists have introduced numerous statistical approaches to explore nonvertical evolution, such as horizontal gene transfer, recombination, and genomic reassortme...

2009 Systematic Biology 58 citations

IQPNNI: Moving Fast Through Tree Space and Stopping in Time

Lê Sỹ Vinh

An efficient tree reconstruction method (IQPNNI) is introduced to reconstruct a phylogenetic tree based on DNA or amino acid sequence data. Our approach combines various fast al...

2004 Molecular Biology and Evolution 169 citations

Differential Galaxy Evolution in Cluster and Field Galaxies at \documentclass{aastex} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{bm} \usepackage{mathrsfs} \usepackage{pifont} \usepackage{stmaryrd} \usepackage{textcomp} \usepackage{portland,xspace} \usepackage{amsmath,amsxtra} \usepackage[OT2,OT1]{fontenc} \newcommand\cyr{ \renewcommand\rmdefault{wncyr} \renewcommand\sfdefault{wncyss} \renewcommand\encodingdefault{OT2} \normalfont \selectfont} \DeclareTextFontCommand{\textcyr}{\cyr} \pagestyle{empty} \DeclareMathSizes{10}{9}{7}{6} \begin{document} \landscape $z\approx 0.3$ \end{document}

Michael L. Balogh , S. L. Morris , H. K. C. Yee +2 more

We measure spectral indexes for 1823 galaxies in the Canadian Network for Observational Cosmology 1 (CNOC1) sample of 15 X-ray luminous clusters at 0.18 < z < 0.55 to investigat...

1999 The Astrophysical Journal 908 citations

Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices

Olga Chernomor , Arndt von Haeseler , Bùi Quang Minh

In phylogenomics the analysis of concatenated gene alignments, the so-called supermatrix, is commonly accompanied by the assumption of partition models. Under such models each g...

2016 Systematic Biology 2153 citations

Publication Info

Year: 2001
Type: article
Volume: 17
Issue: 9
Pages: 821-828
Citations: 209
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A simple algorithm to infer gene duplication and speciation events on a gene tree

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

209

OpenAlex

Cite This

APA Style

                            
                                    Christian M. Zmasek, 
                                
                                    Sean R. Eddy
                                
                            (2001). 
                            A simple algorithm to infer gene duplication and speciation events on a gene tree. 
                            Bioinformatics
                            , 17
                            (9)
                            , 821-828.
                            https://doi.org/10.1093/bioinformatics/17.9.821

Identifiers

DOI: 10.1093/bioinformatics/17.9.821