Abstract

In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in Rd, one can compute a weighted set S ⊆ P, of size O(k ε-d log n), such that one can compute the k-median/means clustering on S instead of on P, and get an (1+ε)-approximation. As a result, we improve the fastest known algorithms for (1+ε)-approximate k-means and k-median. Our algorithms have linear running time for a fixed k and ε. In addition, we can maintain the (1+ε)-approximate k-median or k-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.

Keywords

Cluster analysisDimension (graph theory)CombinatoricsSet (abstract data type)MathematicsPoint (geometry)Binary logarithmApproximation algorithmAlgorithmComputer scienceDiscrete mathematicsStatisticsGeometry

Affiliated Institutions

Related Publications

How fast is the k-means method?

We present polynomial upper and lower bounds on the number of iterations performed by the k-means method (a.k.a. Lloyd's method) for k-means clustering. Our upper bounds are pol...

2005 Symposium on Discrete Algorithms 46 citations

Publication Info

Year
2004
Type
article
Pages
291-300
Citations
486
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

486
OpenAlex

Cite This

Sariel Har-Peled, Soham Mazumdar (2004). On coresets for k-means and k-median clustering. , 291-300. https://doi.org/10.1145/1007352.1007400

Identifiers

DOI
10.1145/1007352.1007400