Abstract

The most popular way of comparing the performance of multiple sequence alignment programs is to use empirical testing on sets of test sequences. Several such test sets now exist, each with potential strengths and weaknesses. We apply several different alignment packages to 6 benchmark datasets, and compare their relative performances. HOMSTRAD, a collection of alignments of homologous proteins, is regularly used as a benchmark for sequence alignment though it is not designed as such, and lacks annotation of reliable regions within the alignment. We introduce this annotation into HOMSTRAD using protein structural superposition. Results on each database show that method performance is dependent on the input sequences. Alignment benchmarks are regularly used in combination to measure performance across a spectrum of alignment problems. Through combining benchmarks, it is possible to detect whether a program has been over-optimised for a single dataset, or alignment problem type.

Keywords

Benchmark (surveying)Multiple sequence alignmentComputer scienceAnnotationSequence alignmentSequence (biology)Alignment-free sequence analysisStructural alignmentStrengths and weaknessesData miningArtificial intelligencePattern recognition (psychology)Peptide sequenceBiologyGenetics

Affiliated Institutions

Related Publications

Publication Info

Year
2006
Type
article
Volume
6
Issue
4
Pages
321-339
Citations
67
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

67
OpenAlex

Cite This

Gordon Blackshields, Iain M. Wallace, Mark Larkin et al. (2006). Analysis and Comparison of Benchmarks for Multiple Sequence Alignment. In Silico Biology , 6 (4) , 321-339. https://doi.org/10.3233/isb-00245

Identifiers

DOI
10.3233/isb-00245