Abstract
Sequential pairwise alignment of multiple sequences is a widely used procedure (Kruskal 1983 ).It is useful and generally successful when sequences within a set differ by relatively few substitutions.Although it is well known that differential substitution rates can artifactually bias the assessment of tree topology (Felsenstein 1978), it is not generally known that the order in which sequences are aligned can bias tree selection.To test the effect of alignment order, the classical four-taxon test has been applied to the "tree of life" (Lake et al. 1984; Woese and Olsen 1986) by using alternative alignments and three reconstruction algorithms [maximum parsimony (Fitch 197 1)) transversion parsimony (Brown et al. 1982), and evolutionary parsimony (Lake 1987)].There is enormous interest in this tree because it relates all known organisms and because its topology is expected to provide insight into the evolution of modem organisms.Because the tree spans large evolutionary distances, its topology has been difficult to establish.By means of sequences from elongation factor Tu ( EF-Tu), the most conserved protein sequence known to span the tree of life, it is shown that specific alignment orders systematically favor alternative trees.In particular, if taxa A and B are pairwise aligned and if C and D are pairwise aligned, the resulting alignment of the EF-Tu sequences more often gives the tree that has A and B as topological neighbors and C and D as topological neighbors, regardless of the tree reconstruction algorithm used.Because all three reconstruction algorithms produced the same tree for any particular alignment, unequal rate effects appear to be secondary for EF-Tu sequences.This indicates that order-dependent alignment biases are distinct from unequal rate effects and that, for some data, they could be as important as unequal rate effects.Pairwise alignments of protein sequences were performed with the ALIGN program available in the Dayhoff package (Dayhoff et al. 1983 ).The penalty for a break was 6, and the mutation data matrix corresponded to 250 accepted point mutations with a bias of +2.These are reasonable values for the weights and correspond to those used in the examples in the description of the ALIGN program.[For an insightful discussion of alignment weights, see the paper by Fitch and Smith ( 1983); also see Waterman and Perlwitz ( 1984) .]EF-Tu sequences were aligned as protein sequences to obtain more robust alignments and were back-translated into nucleic acid sequences (e.g., phe was translated as UUY, leu as YUN, arg as NGN, and ser as NNN) so that the maximum-, transversion-, and evolutionary-parsimony methods could be compared by equivalent data.Only positions consisting of a single nucleotide (i.e., U, C, A, or G but not R, Y, or N) in each of the four sequences were scored.These uniquely defined replacement sites are presumed to correspond to the most conserved nucleotide positions.A multiple alignment of four sequences can be achieved by successively aligning 1.
Keywords
Related Publications
TCS: a computer program to estimate gene genealogies
Phylogenies are extremely useful tools, not only for establishing genealogical relationships among a group of organisms or their parts (e.g. genes), but also for a variety of re...
Generating consensus sequences from partialorder multiple sequence alignment graphs
Abstract Motivation: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. Howe...
Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree
Abstract Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a superāgene alignment, which i...
MAFFT version 5: improvement in accuracy of multiple sequence alignment
The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i...
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individu...
Publication Info
- Year
- 1991
- Type
- letter
- Volume
- 8
- Issue
- 3
- Pages
- 378-85
- Citations
- 150
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/oxfordjournals.molbev.a040654