Abstract
A number of techniques have been studied for the automatic assignment of controlled subject headings and classifications from free indexing. These techniques involve the automatic manipulation and truncation of the free‐index phrases assigned to a document and the use of a manually‐constructed thesaurus and automatically‐generated dictionaries together with statistical ranking and weighting methods. These are based on the use of a statistically‐generated ‘adhesion coefficient’ which reflects the degree of association between the free‐indexing terms, the controlled subject headings, and the classifications. By the analysis of a large sample of manually‐indexed documents the system generates dictionaries of free‐language and controlled‐language terms together with their associated classifications and adhesion coefficients. Having learnt from the manually‐indexed documents the system uses these dictionaries in the subsequent automatic classification procedure. The accuracy and cost‐effectiveness of the automatically‐assigned subject headings and classifications has been compared with that of the manual system. The results were encouraging and the costs comparable to those of a manual system.
Keywords
Related Publications
Recent Studies in Automatic Text Analysis and Document Retrieval
Many experts in mechanized text processing now agree that useful automatic language analysis procedures are largely unavailable and that the existing linguistic methodologies ge...
A Statistical Approach to Mechanized Encoding and Searching of Literary Information
Written communication of ideas is carried out on the basis of statistical probability in that a writer chooses that level of subject specificity and that combination of words wh...
Word‐word associations in document retrieval systems
Abstract The SMART automatic document retrieval system is used to study association procedures for automatic content analysis. The effect of word frequency and other parameters ...
Term Extraction and Automatic Indexing
Terms are pervasive in scientific and technical documents and their identification is a crucial issue for any application dealing with the analysis, understanding, generation, o...
Subject and citation indexing. Part II: The optimal, cluster-based retrieval performance of composite representations
Measures of cluster-based retrieval effectiveness are computed for five composite representations in the cystic fibrosis (CF) Document Collection. The composite representations ...
Publication Info
- Year
- 1975
- Type
- article
- Volume
- 31
- Issue
- 4
- Pages
- 246-265
- Citations
- 33
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1108/eb026605