Abstract
We propose an improved version of the neighbor-joining (NJ) algorithm of Saitou and Nei. This new algorithm, BIONJ, follows the same agglomerative scheme as NJ, which consists of iteratively picking a pair of taxa, creating a new mode which represents the cluster of these taxa, and reducing the distance matrix by replacing both taxa by this node. Moreover, BIONJ uses a simple first-order model of the variances and covariances of evolutionary distance estimates. This model is well adapted when these estimates are obtained from aligned sequences. At each step it permits the selection, from the class of admissible reductions, of the reduction which minimizes the variance of the new distance matrix. In this way, we obtain better estimates to choose the pair of taxa to be agglomerated during the next steps. Moreover, in comparison with NJ's estimates, these estimates become better and better as the algorithm proceeds. BIONJ retains the good properties of NJ--especially its low run time. Computer simulations have been performed with 12-taxon model trees to determine BIONJ's efficiency. When the substitution rates are low (maximum pairwise divergence approximately 0.1 substitutions per site) or when they are constant among lineages, BIONJ is only slightly better than NJ. When the substitution rates are higher and vary among lineages,BIONJ clearly has better topological accuracy. In the latter case, for the model trees and the conditions of evolution tested, the topological error reduction is on the average around 20%. With highly-varying-rate trees and with high substitution rates (maximum pairwise divergence approximately 1.0 substitutions per site), the error reduction may even rise above 50%, while the probability of finding the correct tree may be augmented by as much as 15%.
Keywords
Affiliated Institutions
Related Publications
Estimation of evolutionary distances between homologous nucleotide sequences.
By using two models of evolutionary base substitutions--"three-substitution-type" and "two-frequency-class" models--some formulae are derived which permit a simple estimation of...
Success of Phylogenetic Methods in the Four-Taxon Case
The success of 16 methods of phylogenetic inference was examined using consistency and simulation analysis. Success—the frequency with which a tree-making method correctly ident...
Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences
Researchers routinely estimate distances between molecular sequences using continuous-time Markov chain models. We present a new method, robust counting, that protects against t...
The order of sequence alignment can bias the selection of tree topology.
Sequential pairwise alignment of multiple sequences is a widely used procedure (Kruskal 1983 ).It is useful and generally successful when sequences within a set differ by relati...
Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods
Summary. We derive a Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochas...
Publication Info
- Year
- 1997
- Type
- article
- Volume
- 14
- Issue
- 7
- Pages
- 685-695
- Citations
- 1784
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/oxfordjournals.molbev.a025808