Abstract
We present an application of Hidden Markov Models to supervised document classification and ranking. We consider a family of models that take into account the fact that relevant documents may contain irrelevant passages; the originality of the model is that it does not explicitly segment documents but rather considers all possible segmentations in its final score. This model generalizes the multinomial Naive Bayes and it is derived from a more general model for different access tasks. The model is evaluated on the REUTERS test collection and compared to the multinomial Naive Bayes model. It is shown to be more robust with respect to the training set size and to improve the performance both for ranking and classification, specially for classes with few training examples.
Keywords
Affiliated Institutions
Related Publications
Using Maximum Entropy for Text Classification
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety o...
A comparison of event models for naive bayes text classification
Recent work in text classification has used two different first-order probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multi-va...
Automated learning of decision rules for text categorization
We describe the results of extensive experiments using optimized rule-based induction methods on large document collections. The goal of these methods is to discover automatical...
Document Language Models, Query Models, and Risk Minimization for Information Retrieval
We present a framework for information retrieval that combines document models and query models using a probabilistic ranking function based on Bayesian decision theory. The fra...
Graph Convolutional Networks for Text Classification
Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolu...
Publication Info
- Year
- 2001
- Type
- preprint
- Citations
- 40
- Access
- Closed