Abstract

Subscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation.

Keywords

Cluster analysisComputer scienceRSSRepresentation (politics)Information overloadTask (project management)Information retrievalData miningArtificial intelligenceWorld Wide WebEngineering

Affiliated Institutions

Related Publications

Publication Info

Year
2007
Type
article
Pages
787-788
Citations
335
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

335
OpenAlex

Cite This

Somnath Banerjee, Krishnan Ramanathan, Ajay Gupta (2007). Clustering short texts using wikipedia. , 787-788. https://doi.org/10.1145/1277741.1277909

Identifiers

DOI
10.1145/1277741.1277909