The order of sequence alignment can bias the selection of tree topology.

1991 Molecular Biology and Evolution 150 citations

Abstract

Sequential pairwise alignment of multiple sequences is a widely used procedure (Kruskal 1983 ).It is useful and generally successful when sequences within a set differ by relatively few substitutions.Although it is well known that differential substitution rates can artifactually bias the assessment of tree topology (Felsenstein 1978), it is not generally known that the order in which sequences are aligned can bias tree selection.To test the effect of alignment order, the classical four-taxon test has been applied to the "tree of life" (Lake et al. 1984; Woese and Olsen 1986) by using alternative alignments and three reconstruction algorithms [maximum parsimony (Fitch 197 1)) transversion parsimony (Brown et al. 1982), and evolutionary parsimony (Lake 1987)].There is enormous interest in this tree because it relates all known organisms and because its topology is expected to provide insight into the evolution of modem organisms.Because the tree spans large evolutionary distances, its topology has been difficult to establish.By means of sequences from elongation factor Tu ( EF-Tu), the most conserved protein sequence known to span the tree of life, it is shown that specific alignment orders systematically favor alternative trees.In particular, if taxa A and B are pairwise aligned and if C and D are pairwise aligned, the resulting alignment of the EF-Tu sequences more often gives the tree that has A and B as topological neighbors and C and D as topological neighbors, regardless of the tree reconstruction algorithm used.Because all three reconstruction algorithms produced the same tree for any particular alignment, unequal rate effects appear to be secondary for EF-Tu sequences.This indicates that order-dependent alignment biases are distinct from unequal rate effects and that, for some data, they could be as important as unequal rate effects.Pairwise alignments of protein sequences were performed with the ALIGN program available in the Dayhoff package (Dayhoff et al. 1983 ).The penalty for a break was 6, and the mutation data matrix corresponded to 250 accepted point mutations with a bias of +2.These are reasonable values for the weights and correspond to those used in the examples in the description of the ALIGN program.[For an insightful discussion of alignment weights, see the paper by Fitch and Smith ( 1983); also see Waterman and Perlwitz ( 1984) .]EF-Tu sequences were aligned as protein sequences to obtain more robust alignments and were back-translated into nucleic acid sequences (e.g., phe was translated as UUY, leu as YUN, arg as NGN, and ser as NNN) so that the maximum-, transversion-, and evolutionary-parsimony methods could be compared by equivalent data.Only positions consisting of a single nucleotide (i.e., U, C, A, or G but not R, Y, or N) in each of the four sequences were scored.These uniquely defined replacement sites are presumed to correspond to the most conserved nucleotide positions.A multiple alignment of four sequences can be achieved by successively aligning 1.

Keywords

Tree (set theory)Pairwise comparisonTree rearrangementBiologySequence (biology)Topology (electrical circuits)Selection (genetic algorithm)Evolutionary biologyPhylogenetic treeAlgorithmMathematicsCombinatoricsComputer scienceArtificial intelligenceGeneticsStatisticsGene

Related Publications

Publication Info

Year
1991
Type
letter
Volume
8
Issue
3
Pages
378-85
Citations
150
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

150
OpenAlex

Cite This

James A. Lake (1991). The order of sequence alignment can bias the selection of tree topology.. Molecular Biology and Evolution , 8 (3) , 378-85. https://doi.org/10.1093/oxfordjournals.molbev.a040654

Identifiers

DOI
10.1093/oxfordjournals.molbev.a040654