Abstract
The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. Unfortunately, all known techniques for solving this problem fall prey to the "curse of dimensionality." That is, the data structures scale poorly with data dimensionality; in fact, if the number of dimensions exceeds 10 to 20, searching in k-d trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than brute-force linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather heuristic, determining an approximate nearest neighbor should suffice for most practic...
Keywords
Affiliated Institutions
Related Publications
A Global Geometric Framework for Nonlinear Dimensionality Reduction
Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly confront the problem of ...
Example-based super-resolution
We call methods for achieving high-resolution enlargements of pixel-based images super-resolution algorithms. Many applications in graphics or image processing could benefit fro...
New Powder Diffraction File (PDF-4) in relational database format: advantages and data-mining capabilities
The International Centre for Diffraction Data (ICDD) is responding to the changing needs in powder diffraction and materials analysis by developing the Powder Diffraction File (...
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and s...
Dimensionality reduction for visualizing single-cell data using UMAP
Advances in single-cell technologies have enabled high-resolution dissection of tissue composition. Several tools for dimensionality reduction are available to analyze the large...
Publication Info
- Year
- 1999
- Type
- article
- Pages
- 518-529
- Citations
- 3096
- Access
- Closed