Seeing stars when there aren't many stars

Andrew B. Goldberg; Xiaojin Zhu

doi:10.3115/1654758.1654769

Abstract

We present a graph-based semi-supervised learning algorithm to address the sentiment analysis task of rating inference. Given a set of documents (e.g., movie reviews) and accompanying ratings (e.g., "4 stars"), the task calls for inferring numerical ratings for unlabeled documents based on the perceived sentiment expressed by their text. In particular, we are interested in the situation where labeled data is scarce. We place this task in the semi-supervised setting and demonstrate that considering unlabeled reviews in the learning process can improve rating-inference performance. We do so by creating a graph on both labeled and unlabeled data to encode certain assumptions for this task. We then solve an optimization problem to obtain a smooth rating function over the whole graph. When only limited labeled data is available, this method achieves significantly better predictive accuracy over other methods that ignore the unlabeled examples during training.

Keywords

Computer scienceInferenceGraphArtificial intelligenceSentiment analysisENCODETask (project management)Machine learningLabeled dataSet (abstract data type)Data setNatural language processingTheoretical computer science

Affiliated Institutions

University of Wisconsin–Madison US

Related Publications

Semi-Supervised Classification of Network Data Using Very Few Labels

Frank Lin , William W. Cohen

The goal of semi-supervised learning (SSL) methods is to reduce the amount of labeled training data required by learning from both labeled and unlabeled instances. Macskassy and...

2010 93 citations

A unified architecture for natural language processing

Ronan Collobert , Jason Weston

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named enti...

2008 5151 citations

Seeing stars

Bo Pang , Lillian Lee

We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must det...

2005 2121 citations

Thumbs up?

Bo Pang , Lillian Lee , Shivakumar Vaithyanathan

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data,...

2002 Proceedings of the ACL-02 conference ... 6965 citations

Graph Convolutional Networks for Text Classification

Liang Yao , Chengsheng Mao , Yuan Luo

Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolu...

2019 Proceedings of the AAAI Conference on... 1867 citations

Publication Info

Year: 2006
Type: article
Pages: 45-45
Citations: 317
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Seeing stars when there aren't many stars

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

317

OpenAlex

Cite This

APA Style

                            
                                    Andrew B. Goldberg, 
                                
                                    Xiaojin Zhu
                                
                            (2006). 
                            Seeing stars when there aren't many stars. 
                            
                            , 45-45.
                            https://doi.org/10.3115/1654758.1654769

Identifiers

DOI: 10.3115/1654758.1654769