Context-sensitive learning methods for text categorization

William W. Cohen; Yoram Singer

doi:10.1145/306686.306688

Abstract

Two recently implemented machine-learning algorithms, RIPPER and sleeping-experts for phrases , are evaluated on a number of large text categorization problems. These algorithms both construct classifiers that allow the “context” of a word w to affect how (or even whether) the presence or absence of w will contribute to a classification. However, RIPPER and sleeping-experts differ radically in many other respects: differences include different notions as to what constitutes a context, different ways of combining contexts to construct a classifier, different methods to search for a combination of contexts, and different criteria as to what contexts should be included in such a combination. In spite of these differences, both RIPPER and sleeping-experts perform extremely well across a wide variety of categorization problems, generally outperforming previously applied learning methods. We view this result as a confirmation of the usefulness of classifiers that represent contextual information.

Keywords

CategorizationComputer scienceArtificial intelligenceConstruct (python library)Machine learningVariety (cybernetics)Classifier (UML)Natural language processingContext (archaeology)

Affiliated Institutions

AT&T (United States) US

Related Publications

Machine learning in automated text categorization

Fabrizio Sebastiani

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of ...

2002 ACM Computing Surveys 7805 citations

Decision combination in multiple classifier systems

Tin Kam Ho , J.J. Hull , Sargur N. Srihari

A multiple classifier system is a powerful solution to difficult pattern recognition problems involving large class sets and noisy input because it allows simultaneous use of ar...

1994 IEEE Transactions on Pattern Analysis... 1507 citations

Experiments with a new boosting algorithm

Yoav Freund , Robert E. Schapire

In an earlier paper, we introduced a new &quot;boosting&quot; algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learni...

1996 7561 citations

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction

Gary M. Weiss , Foster Provost

For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the t...

2003 Journal of Artificial Intelligence Re... 918 citations

Recognition of handwritten digits by combining independent learning vector quantizations

T.K. Ho

Classifiers derived by learning vector quantization (LVQ) have well-defined decision regions that can be combined to construct a more accurate classifier. A given point may be i...

2002 21 citations

Publication Info

Year: 1999
Type: article
Volume: 17
Issue: 2
Pages: 141-173
Citations: 357
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Context-sensitive learning methods for text categorization

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

357

OpenAlex

Cite This

APA Style

                            
                                    William W. Cohen, 
                                
                                    Yoram Singer
                                
                            (1999). 
                            Context-sensitive learning methods for text categorization. 
                            ACM Transactions on Information Systems
                            , 17
                            (2)
                            , 141-173.
                            https://doi.org/10.1145/306686.306688

Identifiers

DOI: 10.1145/306686.306688