Abstract

We present a graph-based semi-supervised learning algorithm to address the sentiment analysis task of rating inference. Given a set of documents (e.g., movie reviews) and accompanying ratings (e.g., "4 stars"), the task calls for inferring numerical ratings for unlabeled documents based on the perceived sentiment expressed by their text. In particular, we are interested in the situation where labeled data is scarce. We place this task in the semi-supervised setting and demonstrate that considering unlabeled reviews in the learning process can improve rating-inference performance. We do so by creating a graph on both labeled and unlabeled data to encode certain assumptions for this task. We then solve an optimization problem to obtain a smooth rating function over the whole graph. When only limited labeled data is available, this method achieves significantly better predictive accuracy over other methods that ignore the unlabeled examples during training.

Keywords

Computer scienceInferenceGraphArtificial intelligenceSentiment analysisENCODETask (project management)Machine learningLabeled dataSet (abstract data type)Data setNatural language processingTheoretical computer science

Affiliated Institutions

Related Publications

Seeing stars

We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must det...

2005 2121 citations

Thumbs up?

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data,...

2002 Proceedings of the ACL-02 conference ... 6965 citations

Publication Info

Year
2006
Type
article
Pages
45-45
Citations
317
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

317
OpenAlex

Cite This

Andrew B. Goldberg, Xiaojin Zhu (2006). Seeing stars when there aren't many stars. , 45-45. https://doi.org/10.3115/1654758.1654769

Identifiers

DOI
10.3115/1654758.1654769