MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

Abstract

A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

Keywords

Fast Fourier transformComputer scienceMultiple sequence alignmentSplit-radix FFT algorithmSequence (biology)Parallel computingBenchmark (surveying)AlgorithmHeuristicsSequence alignmentFourier transformComputational scienceBiologyMathematicsPeptide sequenceFourier analysisShort-time Fourier transform

Affiliated Institutions

Kyoto University JP

Related Publications

Local homology recognition and distance measures in linear time using compressed amino acid alphabets

R. C. Edgar

Methods for discovery of local similarities and estimation of evolutionary distance by identifying k-mers (contiguous subsequences of length k) common to two sequences are descr...

2004 Nucleic Acids Research 151 citations

MAFFT version 5: improvement in accuracy of multiple sequence alignment

Kazutaka Katoh

The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i...

2005 Nucleic Acids Research 4851 citations

Computational Frameworks for the Fast Fourier Transform

Charles Van Loan

1. The Radix-2 Frameworks. Matrix Notation and Algorithms The FFT Idea The Cooley-Tukey Factorization Weight and Butterfly Computations Bit Reversal and Transposition The Cooley...

1992 Society for Industrial and Applied Ma... 1265 citations

PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences

Kazutaka Katoh , Hiroyuki Toh

Abstract Motivation: To construct a multiple sequence alignment (MSA) of a large number (&gt;∼10 000) of sequences, the calculation of a guide tree with a complexity of O(N2...

2006 Bioinformatics 108 citations

MUSCLE: multiple sequence alignment with high accuracy and high throughput

R. C. Edgar

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting,...

2004 Nucleic Acids Research 44728 citations

Publication Info

Year: 2002
Type: article
Volume: 30
Issue: 14
Pages: 3059-3066
Citations: 16606
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

16606

OpenAlex

Cite This

APA Style

                            
                                    Kazutaka Katoh
                                
                            (2002). 
                            MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. 
                            Nucleic Acids Research
                            , 30
                            (14)
                            , 3059-3066.
                            https://doi.org/10.1093/nar/gkf436

Identifiers

DOI: 10.1093/nar/gkf436