Abstract

DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.

Keywords

Sequence databaseComputer scienceCompressed suffix arraySmith–Waterman algorithmAlgorithmDatabase search engineSuffix arraySequence alignmentHomology (biology)MetagenomicsSearch algorithmSuffix treeData miningComputational biologyData structureBiologySearch engineInformation retrievalGeneticsPeptide sequence

Affiliated Institutions

Related Publications

Publication Info

Year
2014
Type
article
Volume
9
Issue
8
Pages
e103833-e103833
Citations
91
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

91
OpenAlex

Cite This

Shuji Suzuki, Masanori Kakuta, Takashi Ishida et al. (2014). GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array. PLoS ONE , 9 (8) , e103833-e103833. https://doi.org/10.1371/journal.pone.0103833

Identifiers

DOI
10.1371/journal.pone.0103833