Analysis and Comparison of Benchmarks for Multiple Sequence Alignment

Abstract

The most popular way of comparing the performance of multiple sequence alignment programs is to use empirical testing on sets of test sequences. Several such test sets now exist, each with potential strengths and weaknesses. We apply several different alignment packages to 6 benchmark datasets, and compare their relative performances. HOMSTRAD, a collection of alignments of homologous proteins, is regularly used as a benchmark for sequence alignment though it is not designed as such, and lacks annotation of reliable regions within the alignment. We introduce this annotation into HOMSTRAD using protein structural superposition. Results on each database show that method performance is dependent on the input sequences. Alignment benchmarks are regularly used in combination to measure performance across a spectrum of alignment problems. Through combining benchmarks, it is possible to detect whether a program has been over-optimised for a single dataset, or alignment problem type.

Keywords

Benchmark (surveying)Multiple sequence alignmentComputer scienceAnnotationSequence alignmentSequence (biology)Alignment-free sequence analysisStructural alignmentStrengths and weaknessesData miningArtificial intelligencePattern recognition (psychology)Peptide sequenceBiologyGenetics

Affiliated Institutions

University College Dublin IE

Related Publications

MUSCLE: multiple sequence alignment with high accuracy and high throughput

R. C. Edgar

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting,...

2004 Nucleic Acids Research 44728 citations

PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees

Simon Whelan

PANDIT is a database of homologous sequence alignments accompanied by estimates of their corresponding phylogenetic trees. It provides a valuable resource to those studying phyl...

2005 Nucleic Acids Research 70 citations

Jalview Version 2—a multiple sequence alignment editor and analysis workbench

Andrew Waterhouse , James B Procter , David Martin +2 more

Abstract Summary: Jalview Version 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments. Core features include keyboard and mous...

2009 Bioinformatics 10302 citations

Generating consensus sequences from partialorder multiple sequence alignment graphs

Christopher J. Lee

Abstract Motivation: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. Howe...

2003 Bioinformatics 99 citations

TarO: a target optimisation system for structural biology

Ian M. Overton , C. A. Johannes van Niekerk , Lester G. Carter +8 more

TarO (http://www.compbio.dundee.ac.uk/taro) offers a single point of reference for key bioinformatics analyses relevant to selecting proteins or domains for study by structural ...

2008 Nucleic Acids Research 114 citations

Publication Info

Year: 2006
Type: article
Volume: 6
Issue: 4
Pages: 321-339
Citations: 67
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Analysis and Comparison of Benchmarks for Multiple Sequence Alignment

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                
                                    Gordon Blackshields, 
                                
                                    Iain M. Wallace, 
                                
                                    Mark Larkin
                                
                                et al.
                            
                            (2006). 
                            Analysis and Comparison of Benchmarks for Multiple Sequence Alignment. 
                            In Silico Biology
                            , 6
                            (4)
                            , 321-339.
                            https://doi.org/10.3233/isb-00245
                        

Identifiers

DOI: 10.3233/isb-00245