Abstract

The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. Unfortunately, all known techniques for solving this problem fall prey to the "curse of dimensionality." That is, the data structures scale poorly with data dimensionality; in fact, if the number of dimensions exceeds 10 to 20, searching in k-d trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than brute-force linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather heuristic, determining an approximate nearest neighbor should suffice for most practic...

Keywords

Nearest neighbor searchComputer scienceHash functionContext (archaeology)Similarity (geometry)Information retrievalData miningLocality-sensitive hashingCurse of dimensionalityTheoretical computer scienceDatabaseHash tableArtificial intelligence

Affiliated Institutions

Related Publications

Publication Info

Year
1999
Type
article
Pages
518-529
Citations
3096
Access
Closed

External Links

Citation Metrics

3096
OpenAlex

Cite This

Aristides Gionis, Piotr Indyk, Rajeev Motwani (1999). Similarity Search in High Dimensions via Hashing. , 518-529.