Abstract
A probabilistic analysis of the Rocchio relevance feedback algorithm, one of the most popular learning methods from information retrieval, is presented in a text categorization framework. The analysis results in a probabilistic version of the Rocchio classifier and offers an explanation for the TFIDF word weighting heuristic. The Rocchio classifier, its probabilistic variant and a standard naive Bayes classifier are compared on three text categorization tasks. The results suggest that the probabilistic algorithms are preferable to the heuristic Rocchio classifier. This research is sponsored by the Wright Laboratory, Aeronautical Systems Center, Air Force Materiel Command, USAF, and the Advanced Research Projects Agency (ARPA) under grant F33615-93-1-1330. The US Government is authorized to reproduce and distribute reprints for Government purposes, notwithstanding any copyright notation thereon. Views and conclusions contained in this document are those of the authors and should not be ...
Keywords
Related Publications
HMM-based passage models for document classification and ranking
We present an application of Hidden Markov Models to supervised document classification and ranking. We consider a family of models that take into account the fact that relevant...
A comparison of event models for naive bayes text classification
Recent work in text classification has used two different first-order probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multi-va...
Machine learning in automated text categorization
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of ...
Using Maximum Entropy for Text Classification
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety o...
On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
We compare discriminative and generative learning as typied by logistic regression and naive Bayes. We show, contrary to a widelyheld belief that discriminative classiers are al...
Publication Info
- Year
- 1997
- Type
- article
- Pages
- 143-151
- Citations
- 1265
- Access
- Closed