A Statistical Approach to Mechanized Encoding and Searching of Literary Information

1957 IBM Journal of Research and Development 1,056 citations

Abstract

Written communication of ideas is carried out on the basis of statistical probability in that a writer chooses that level of subject specificity and that combination of words which he feels will convey the most meaning. Since this process varies among individuals and since similar ideas are therefore relayed at different levels of specificity and by means of different words, the problem of literature searching by machines still presents major difficulties. A statistical approach to this problem will be outlined and the various steps of a system based on this approach will be described. Steps include the statistical analysis of a collection of documents in a field of interest, the establishment of a set of “notions” and the vocabulary by which they are expressed, the compilation of a thesaurus-type dictionary and index, the automatic encoding of documents by machine with the aid of such a dictionary, the encoding of topological notations (such as branched structures), the recording of the coded information, the establishment of a searching pattern for finding pertinent information, and the programming of appropriate machines to carry out a search.

Keywords

Computer scienceEncoding (memory)VocabularySet (abstract data type)Carry (investment)Field (mathematics)NotationProcess (computing)Meaning (existential)Artificial intelligenceNatural language processingBasis (linear algebra)Information retrievalThesaurusProgramming languageLinguisticsMathematicsArithmetic

Related Publications

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem.Previous work addresses the translation of out-o...

Edinburgh Research Explorer (Universi... 6994 citations

Lexical relations

One of the essential features of the "Meaning <=> Text" model (MTM) developed by I. A. Mel'chuk et. al. is the special lexicon or ECD ('explanatory and combinatory' dictio...

1980 ACM SIGIR Forum 62 citations

Publication Info

Year
1957
Type
article
Volume
1
Issue
4
Pages
309-317
Citations
1056
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1056
OpenAlex

Cite This

H. P. Luhn (1957). A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development , 1 (4) , 309-317. https://doi.org/10.1147/rd.14.0309

Identifiers

DOI
10.1147/rd.14.0309