Abstract

A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising. © 1990 John Wiley & Sons, Inc.

Keywords

Singular value decompositionLatent semantic analysisComputer scienceSearch engine indexingInformation retrievalSet (abstract data type)Basis (linear algebra)Cosine similarityVector space modelTerm (time)Matrix (chemical analysis)Data miningAlgorithmPattern recognition (psychology)MathematicsArtificial intelligence

Affiliated Institutions

Related Publications

Principal component analysis

Abstract Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter‐correlated quantitative d...

2010 Wiley Interdisciplinary Reviews Compu... 9554 citations

Publication Info

Year
1990
Type
article
Volume
41
Issue
6
Pages
391-407
Citations
12614
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

12614
OpenAlex

Cite This

Scott Deerwester, Susan Dumais, George W. Furnas et al. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science , 41 (6) , 391-407. https://doi.org/10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9

Identifiers

DOI
10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9