Subject and citation indexing. Part II: The optimal, cluster-based retrieval performance of composite representations

W. M. Shaw

doi:10.1002/(sici)1097-4571(199110)42:9<676::aid-asi6>3.0.co;2-2

Abstract

Measures of cluster-based retrieval effectiveness are computed for five composite representations in the cystic fibrosis (CF) Document Collection. The composite representations are constructed from combinations of two subject representations, based on Medical Subject Headings and subheadings, and two citation representations, consisting of the complete list of cited references and a comprehensive list of citations for each document. Experimental retrieval results are presented as a function of the exhaustivity and similarity of the composite representations and reveal consistent patterns from which optimal performance levels can be identified. The optimal performance values provide an assessment of the absolute capacity of each composite representation to associate documents relevant to the same query and discriminate between documents relevant to different queries in single-link hierarchies. The optimal performance values for all composite representations are completely comparable and are superior to the optimal performance of constituent representations. Optimal performance consistently occurs at low levels of exhaustivity. Exhaustive composite representations that include subject descriptions produce the lowest levels of performance; retrieval results derived from random structures are comparable to the observed results. The effectiveness of the exhaustive representation composed of references and citations is materially superior to the effectiveness of exhaustive composite representations that include subject descriptions. © 1991 John Wiley & Sons, Inc.

Keywords

Subject (documents)Information retrievalSearch engine indexingCitationCluster (spacecraft)Computer scienceComposite numberWorld Wide WebAlgorithm

Affiliated Institutions

University of North Carolina at Chapel Hill US

Related Publications

Subject and citation indexing. Part I: The clustering structure of composite representations in the Cystic Fibrosis Document Collection

W. M. Shaw

The presence of clustering structure in the cystic fibrosis (CF) Document Collection is evaluated as a function of the exhaustivity of five composite representations. The compos...

1991 Journal of the American Society for I... 10 citations

TOWARDS AUTOMATIC INDEXING: AUTOMATIC ASSIGNMENT OF CONTROLLED‐LANGUAGE INDEXING AND CLASSIFICATION FROM FREE INDEXING

Barry James Field

A number of techniques have been studied for the automatic assignment of controlled subject headings and classifications from free indexing. These techniques involve the automat...

1975 Journal of Documentation 33 citations

Recent Studies in Automatic Text Analysis and Document Retrieval

Gerard Salton

Many experts in mechanized text processing now agree that useful automatic language analysis procedures are largely unavailable and that the existing linguistic methodologies ge...

1973 Journal of the ACM 44 citations

Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques

Junbo Yi , Tetsuya Nasukawa , Răzvan Bunescu +1 more

We present sentiment analyzer (SA) that extracts sentiment (or opinion) about a subject from online text documents. Instead of classifying the sentiment of an entire document ab...

2004 719 citations

Using Linear Algebra for Intelligent Information Retrieval

Michael W. Berry , Susan Dumais , Gavin O'Brien

Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users’ requests and those in or assigned to docum...

1995 SIAM Review 1482 citations

Publication Info

Year: 1991
Type: article
Volume: 42
Issue: 9
Pages: 676-684
Citations: 20
Access: Closed

External Links

Download PDF (Free) View on DOI.org Semantic Scholar

Social Impact

Altmetric

Subject and citation indexing. Part II: The optimal, cluster-based retrieval performance of composite representations

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Influential

CrossRef

Cite This

APA Style

                            
                                    W. M. Shaw
                                
                            (1991). 
                            Subject and citation indexing. Part II: The optimal, cluster-based retrieval performance of composite representations. 
                            Journal of the American Society for Information Science
                            , 42
                            (9)
                            , 676-684.
                            https://doi.org/10.1002/(sici)1097-4571(199110)42:9<676::aid-asi6>3.0.co;2-2

Identifiers

DOI: 10.1002/(sici)1097-4571(199110)42:9<676::aid-asi6>3.0.co;2-2

Data Quality

Data completeness: 81%