Abstract
A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.
Keywords
Affiliated Institutions
Related Publications
Local homology recognition and distance measures in linear time using compressed amino acid alphabets
Methods for discovery of local similarities and estimation of evolutionary distance by identifying k-mers (contiguous subsequences of length k) common to two sequences are descr...
MAFFT version 5: improvement in accuracy of multiple sequence alignment
The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i...
Computational Frameworks for the Fast Fourier Transform
1. The Radix-2 Frameworks. Matrix Notation and Algorithms The FFT Idea The Cooley-Tukey Factorization Weight and Butterfly Computations Bit Reversal and Transposition The Cooley...
PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences
Abstract Motivation: To construct a multiple sequence alignment (MSA) of a large number (>∼10 000) of sequences, the calculation of a guide tree with a complexity of O(N2...
MUSCLE: multiple sequence alignment with high accuracy and high throughput
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting,...
Publication Info
- Year
- 2002
- Type
- article
- Volume
- 30
- Issue
- 14
- Pages
- 3059-3066
- Citations
- 16606
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/nar/gkf436