Abstract

Recently attention has been turned to the problem of reconstructing complete ancestral sequences from large multiple alignments. Successful generation of these genome-wide reconstructions will facilitate a greater knowledge of the events that have driven evolution. We present a new evolutionary alignment modeler, called “Ortheus,” for inferring the evolutionary history of a multiple alignment, in terms of both substitutions and, importantly, insertions and deletions. Based on a multiple sequence probabilistic transducer model of the type proposed by Holmes, Ortheus uses efficient stochastic graph-based dynamic programming methods. Unlike other methods, Ortheus does not rely on a single fixed alignment from which to work. Ortheus is also more scaleable than previous methods while being fast, stable, and open source. Large-scale simulations show that Ortheus performs close to optimally on a deep mammalian phylogeny. Simulations also indicate that significant proportions of errors due to insertions and deletions can be avoided by not assuming a fixed alignment. We additionally use a challenging hold-out cross-validation procedure to test the method; using the reconstructions to predict extant sequence bases, we demonstrate significant improvements over using closest extant neighbor sequences. Accompanying this paper, a new, public, and genome-wide set of Ortheus ancestor alignments provide an intriguing new resource for evolutionary studies in mammals. As a first piece of analysis, we attempt to recover “fossilized” ancestral pseudogenes. We confidently find 31 cases in which the ancestral sequence had a more complete sequence than any of the extant sequences.

Keywords

BiologyExtant taxonGenomeEvolutionary biologyAlignment-free sequence analysisSequence (biology)Phylogenetic treePhylogeneticsMost recent common ancestorMultiple sequence alignmentComputational biologyPseudogeneAncestorSequence alignmentGeneticsGenePeptide sequence

Affiliated Institutions

Related Publications

Publication Info

Year
2008
Type
article
Volume
18
Issue
11
Pages
1829-1843
Citations
203
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

203
OpenAlex

Cite This

Benedict Paten, Javier Herrero, Stephen Fitzgerald et al. (2008). Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Research , 18 (11) , 1829-1843. https://doi.org/10.1101/gr.076521.108

Identifiers

DOI
10.1101/gr.076521.108