MUSCLE: multiple sequence alignment with high accuracy and high throughput

2004 Nucleic Acids Research 44,728 citations

Abstract

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

Keywords

Benchmark (surveying)Multiple sequence alignmentBiologyComputer scienceRank (graph theory)Sequence alignmentSource codeTree (set theory)MathematicsCombinatorics

Affiliated Institutions

Related Publications

Publication Info

Year
2004
Type
article
Volume
32
Issue
5
Pages
1792-1797
Citations
44728
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

44728
OpenAlex

Cite This

R. C. Edgar (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research , 32 (5) , 1792-1797. https://doi.org/10.1093/nar/gkh340

Identifiers

DOI
10.1093/nar/gkh340