The order of sequence alignment can bias the selection of tree topology.

Abstract

Sequential pairwise alignment of multiple sequences is a widely used procedure (Kruskal 1983 ).It is useful and generally successful when sequences within a set differ by relatively few substitutions.Although it is well known that differential substitution rates can artifactually bias the assessment of tree topology (Felsenstein 1978), it is not generally known that the order in which sequences are aligned can bias tree selection.To test the effect of alignment order, the classical four-taxon test has been applied to the "tree of life" (Lake et al. 1984; Woese and Olsen 1986) by using alternative alignments and three reconstruction algorithms [maximum parsimony (Fitch 197 1)) transversion parsimony (Brown et al. 1982), and evolutionary parsimony (Lake 1987)].There is enormous interest in this tree because it relates all known organisms and because its topology is expected to provide insight into the evolution of modem organisms.Because the tree spans large evolutionary distances, its topology has been difficult to establish.By means of sequences from elongation factor Tu ( EF-Tu), the most conserved protein sequence known to span the tree of life, it is shown that specific alignment orders systematically favor alternative trees.In particular, if taxa A and B are pairwise aligned and if C and D are pairwise aligned, the resulting alignment of the EF-Tu sequences more often gives the tree that has A and B as topological neighbors and C and D as topological neighbors, regardless of the tree reconstruction algorithm used.Because all three reconstruction algorithms produced the same tree for any particular alignment, unequal rate effects appear to be secondary for EF-Tu sequences.This indicates that order-dependent alignment biases are distinct from unequal rate effects and that, for some data, they could be as important as unequal rate effects.Pairwise alignments of protein sequences were performed with the ALIGN program available in the Dayhoff package (Dayhoff et al. 1983 ).The penalty for a break was 6, and the mutation data matrix corresponded to 250 accepted point mutations with a bias of +2.These are reasonable values for the weights and correspond to those used in the examples in the description of the ALIGN program.[For an insightful discussion of alignment weights, see the paper by Fitch and Smith ( 1983); also see Waterman and Perlwitz ( 1984) .]EF-Tu sequences were aligned as protein sequences to obtain more robust alignments and were back-translated into nucleic acid sequences (e.g., phe was translated as UUY, leu as YUN, arg as NGN, and ser as NNN) so that the maximum-, transversion-, and evolutionary-parsimony methods could be compared by equivalent data.Only positions consisting of a single nucleotide (i.e., U, C, A, or G but not R, Y, or N) in each of the four sequences were scored.These uniquely defined replacement sites are presumed to correspond to the most conserved nucleotide positions.A multiple alignment of four sequences can be achieved by successively aligning 1.

Keywords

Tree (set theory)Pairwise comparisonTree rearrangementBiologySequence (biology)Topology (electrical circuits)Selection (genetic algorithm)Evolutionary biologyPhylogenetic treeAlgorithmMathematicsCombinatoricsComputer scienceArtificial intelligenceGeneticsStatisticsGene

Related Publications

TCS: a computer program to estimate gene genealogies

Mark Clement , David Posada , Keith A. Crandall

Phylogenies are extremely useful tools, not only for establishing genealogical relationships among a group of organisms or their parts (e.g. genes), but also for a variety of re...

2000 Molecular Ecology 9775 citations

Generating consensus sequences from partialorder multiple sequence alignment graphs

Christopher J. Lee

Abstract Motivation: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. Howe...

2003 Bioinformatics 99 citations

Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree

Sudhindra R. Gadagkar , Michael S. Rosenberg , Sudhir Kumar

Abstract Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super‐gene alignment, which i...

2004 Journal of Experimental Zoology Part ... 457 citations

MAFFT version 5: improvement in accuracy of multiple sequence alignment

Kazutaka Katoh

The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i...

2005 Nucleic Acids Research 4851 citations

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

Julie Thompson , Desmond G. Higgins , Toby J. Gibson

The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individu...

1994 Nucleic Acids Research 64103 citations

Publication Info

Year: 1991
Type: letter
Volume: 8
Issue: 3
Pages: 378-85
Citations: 150
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

The order of sequence alignment can bias the selection of tree topology.

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

150

OpenAlex

Cite This

APA Style

                            
                                    James A. Lake
                                
                            (1991). 
                            The order of sequence alignment can bias the selection of tree topology.. 
                            Molecular Biology and Evolution
                            , 8
                            (3)
                            , 378-85.
                            https://doi.org/10.1093/oxfordjournals.molbev.a040654

Identifiers

DOI: 10.1093/oxfordjournals.molbev.a040654