Abstract

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

Keywords

Naive Bayes classifierComputer scienceCategorizationArtificial intelligenceSentiment analysisSupport vector machineMachine learningPrinciple of maximum entropyNatural language processing

Affiliated Institutions

Related Publications

Thumbs up?

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data,...

2002 Proceedings of the ACL-02 conference ... 6965 citations

A sentimental education

Sentiment analysis seeks to identify the viewpoint(s) underlying a text span; an example application is classifying a movie review as "thumbs up" or "thumbs down". To determine ...

2004 3318 citations

Seeing stars

We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must det...

2005 2121 citations

Publication Info

Year
2002
Type
preprint
Citations
2207
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2207
OpenAlex

Cite This

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan (2002). Thumbs up? Sentiment Classification using Machine Learning Techniques. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.cs/0205070

Identifiers

DOI
10.48550/arxiv.cs/0205070