Abstract
Written communication of ideas is carried out on the basis of statistical probability in that a writer chooses that level of subject specificity and that combination of words which he feels will convey the most meaning. Since this process varies among individuals and since similar ideas are therefore relayed at different levels of specificity and by means of different words, the problem of literature searching by machines still presents major difficulties. A statistical approach to this problem will be outlined and the various steps of a system based on this approach will be described. Steps include the statistical analysis of a collection of documents in a field of interest, the establishment of a set of “notions” and the vocabulary by which they are expressed, the compilation of a thesaurus-type dictionary and index, the automatic encoding of documents by machine with the aid of such a dictionary, the encoding of topological notations (such as branched structures), the recording of the coded information, the establishment of a searching pattern for finding pertinent information, and the programming of appropriate machines to carry out a search.
Keywords
Related Publications
Letter N-Gram-based Input Encoding for Continuous Space Language Models
We present a letter-based encoding for words in continuous space language models. We represent the words completely by letter n-grams instead of using the word index. This way, ...
Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles
Chemical markup language (CML) is an application of XML, the extensible markup language, developed for containing chemical information components within documents. Its design su...
An exploration of large vocabulary tools for small vocabulary phonetic recognition
While research in large vocabulary continuous speech recognition (LVCSR) has sparked the development of many state of the art research ideas, research in this domain suffers fro...
Lexical relations
One of the essential features of the "Meaning <=> Text" model (MTM) developed by I. A. Mel'chuk et. al. is the special lexicon or ECD ('explanatory and combinatory' dictio...
Publication Info
- Year
- 1957
- Type
- article
- Volume
- 1
- Issue
- 4
- Pages
- 309-317
- Citations
- 1056
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1147/rd.14.0309