F<scp>AST</scp>SP: linear time calculation of alignment accuracy

Abstract

Abstract Motivation: Multiple sequence alignment is a basic part of much biological research, including phylogeny estimation and protein structure and function prediction. Different alignments on the same set of unaligned sequences are often compared, sometimes in order to assess the accuracy of alignment methods or to infer a consensus alignment from a set of estimated alignments. Three of the standard techniques for comparing alignments, Developer, Modeler and Total Column (TC) scores can be derived through calculations of the set of homologies that the alignments share. However, the brute-force technique for calculating this set is quadratic in the input size. The remaining standard technique, Cline Shift Score, inherently requires quadratic time. Results: In this article, we prove that each of these scores can be computed in linear time, and we present FastSP, a linear-time algorithm for calculating these scores. Even on the largest alignments we explored (one with 50 000 sequences), FastSP completed &lt;2 min and used at most 2 GB of the main memory. The best alternative is qscore, a method whose empirical running time is approximately the same as FastSP when given sufficient memory (at least 8 GB), but whose asymptotic running time has never been theoretically established. In addition, for comparisons of large alignments under lower memory conditions (at most 4 GB of main memory), qscore uses substantial memory (up to 10 GB for the datasets we studied), took more time and failed to analyze the largest datasets. Availability: The open-source software and executables are available online at http://www.cs.utexas.edu/~phylo/software/fastsp/. Contact: tandy@cs.utexas.edu

Keywords

ExecutableSet (abstract data type)Computer scienceMultiple sequence alignmentAlgorithmFunction (biology)SoftwareSequence (biology)Time complexityQuadratic equationSequence alignmentMathematicsBiology

Affiliated Institutions

The University of Texas at Austin US

Related Publications

Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data

Alexandros Stamatakis , Nikolaos Alachiotis

Abstract Motivation: The current molecular data explosion poses new challenges for large-scale phylogenomic analyses that can comprise hundreds or even thousands of genes. A pro...

2010 Bioinformatics 128 citations

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Haoyi Zhou , Shanghang Zhang , Jieqi Peng +4 more

Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands ...

2021 Proceedings of the AAAI Conference on... 4749 citations

Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.

Roman L. Tatusov , Stephen F. Altschul , Eugene V. Koonin

We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserve...

1994 Proceedings of the National Academy o... 286 citations

Generating consensus sequences from partialorder multiple sequence alignment graphs

Christopher J. Lee

Abstract Motivation: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. Howe...

2003 Bioinformatics 99 citations

DCSE, an interactive tool for sequence alignment and secondary structure research

Peter De Rijk , Rupert De Wächter

DCSE provides a user-friendly package for the creation and editing of sequence alignments. The program runs on different platforms, including microcomputers and workstations. Ap...

1993 Computer applications in the biosciences 251 citations

Publication Info

Year: 2011
Type: article
Volume: 27
Issue: 23
Pages: 3250-3258
Citations: 63
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

F<scp>AST</scp>SP: linear time calculation of alignment accuracy

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Siavash Mirarab, 
                                
                                    Tandy Warnow
                                
                            (2011). 
                            F<scp>AST</scp>SP: linear time calculation of alignment accuracy. 
                            Bioinformatics
                            , 27
                            (23)
                            , 3250-3258.
                            https://doi.org/10.1093/bioinformatics/btr553

Identifiers

DOI: 10.1093/bioinformatics/btr553