Abstract
Abstract Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
Keywords
Affiliated Institutions
Related Publications
Using <i>de novo</i> protein structure predictions to measure the quality of very large multiple sequence alignments
Abstract Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for t...
Protein multiple sequence alignment benchmarking through secondary structure prediction
Abstract Motivation Multiple sequence alignment (MSA) is commonly used to analyze sets of homologous protein or DNA sequences. This has lead to the development of many methods a...
PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences
Abstract Motivation: To construct a multiple sequence alignment (MSA) of a large number (&gt;∼10 000) of sequences, the calculation of a guide tree with a complexity of O(N2...
Simple chained guide trees give high-quality protein multiple sequence alignments
Significance To make sense of protein sequences, they need to be compared with each other. It is common to make a multiple sequence alignment where gaps are inserted to line up ...
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i)...
Publication Info
- Year
- 2016
- Type
- article
- Volume
- 32
- Issue
- 21
- Pages
- 3246-3251
- Citations
- 354
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btw412