Abstract

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

Keywords

Computer scienceNaive Bayes classifierSupport vector machineCategorizationArtificial intelligenceSentiment analysisMachine learningPrinciple of maximum entropyEntropy (arrow of time)Natural language processingStatistical classification

Affiliated Institutions

Related Publications

A sentimental education

Sentiment analysis seeks to identify the viewpoint(s) underlying a text span; an example application is classifying a movie review as "thumbs up" or "thumbs down". To determine ...

2004 3318 citations

Seeing stars

We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must det...

2005 2121 citations

Publication Info

Year
2002
Type
article
Volume
10
Pages
79-86
Citations
6965
Access
Closed

Social Impact

Altmetric
PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

6965
OpenAlex
813
Influential
3781
CrossRef

Cite This

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan (2002). Thumbs up?. Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP '02 , 10 , 79-86. https://doi.org/10.3115/1118693.1118704

Identifiers

DOI
10.3115/1118693.1118704
arXiv
cs/0205070

Data Quality

Data completeness: 84%