Search and clustering orders of magnitude faster than BLAST

2010 Bioinformatics 20,899 citations

Abstract

Abstract Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch Contact: robert@drive5.com Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

Cluster analysisComputer scienceSequence (biology)Sensitivity (control systems)Data miningExploitPattern recognition (psychology)Machine learningArtificial intelligenceBiology

Affiliated Institutions

Related Publications

Publication Info

Year
2010
Type
article
Volume
26
Issue
19
Pages
2460-2461
Citations
20899
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

20899
OpenAlex

Cite This

R. C. Edgar (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics , 26 (19) , 2460-2461. https://doi.org/10.1093/bioinformatics/btq461

Identifiers

DOI
10.1093/bioinformatics/btq461