Abstract

In this paper, we address the problem of statistical learning for multitopic text categorization (MTC), whose goal is to choose all relevant topics (a label) from a given set of topics. The proposed algorithm, Maximal Margin Labeling (MML), treats all possible labels as independent classes and learns a multi-class classifier on the induced multi-class categorization problem. To cope with the data sparseness caused by the huge number of possible labels, MML combines some prior knowledge about label prototypes and a maximal margin criterion in a novel way. Experiments with multi-topic Web pages show that MML outperforms existing learning algorithms including Support Vector Machines. 1 Multi-topic Text Categorization (MTC) This paper addresses the problem of learning for multi-topic text categorization (MTC), whose goal is to select all topics relevant to a text from a given set of topics. In MTC, multiple topics may be relevant to a single text. We thus call a set of topics label, and say

Keywords

CategorizationText categorizationMargin (machine learning)Computer scienceArtificial intelligenceMachine learningClassifier (UML)Class (philosophy)Support vector machineSet (abstract data type)Natural language processingPattern recognition (psychology)

Affiliated Institutions

Related Publications

Seeing stars

We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must det...

2005 2121 citations

Thumbs up?

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data,...

2002 Proceedings of the ACL-02 conference ... 6965 citations

Publication Info

Year
2004
Type
article
Volume
17
Pages
649-656
Citations
113
Access
Closed

External Links

Citation Metrics

113
OpenAlex

Cite This

Hideto Kazawa, Tomonori Izumitani, Hirotoshi Taira et al. (2004). Maximal Margin Labeling for Multi-Topic Text Categorization. , 17 , 649-656.