MUSCLE: multiple sequence alignment with high accuracy and high throughput

Abstract

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

Keywords

Benchmark (surveying)Multiple sequence alignmentBiologyComputer scienceRank (graph theory)Sequence alignmentSource codeTree (set theory)MathematicsCombinatorics

Affiliated Institutions

Mill Valley Public Library US

Related Publications

MAFFT version 5: improvement in accuracy of multiple sequence alignment

Kazutaka Katoh

The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i...

2005 Nucleic Acids Research 4851 citations

Generating consensus sequences from partialorder multiple sequence alignment graphs

Christopher J. Lee

Abstract Motivation: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. Howe...

2003 Bioinformatics 99 citations

The Jalview Java alignment editor

Michèle Clamp , James Cuff , Stephen M. J. Searle +1 more

Abstract Summary: Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is kno...

2004 Bioinformatics 1538 citations

Analysis and Comparison of Benchmarks for Multiple Sequence Alignment

Gordon Blackshields , Iain M. Wallace , Mark Larkin +1 more

The most popular way of comparing the performance of multiple sequence alignment programs is to use empirical testing on sets of test sequences. Several such test sets now exist...

2006 In Silico Biology 67 citations

The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools

Julie Thompson , Toby J. Gibson , Frédéric Plewniak +2 more

CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system...

1997 Nucleic Acids Research 38996 citations

Publication Info

Year: 2004
Type: article
Volume: 32
Issue: 5
Pages: 1792-1797
Citations: 44728
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

MUSCLE: multiple sequence alignment with high accuracy and high throughput

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

44728

OpenAlex

Cite This

APA Style

                            
                                    R. C. Edgar
                                
                            (2004). 
                            MUSCLE: multiple sequence alignment with high accuracy and high throughput. 
                            Nucleic Acids Research
                            , 32
                            (5)
                            , 1792-1797.
                            https://doi.org/10.1093/nar/gkh340

Identifiers

DOI: 10.1093/nar/gkh340