On coresets for k-means and k-median clustering

Abstract

In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that given a point set P in Rd, one can compute a weighted set S ⊆ P, of size O(k ε-d log n), such that one can compute the k-median/means clustering on S instead of on P, and get an (1+ε)-approximation. As a result, we improve the fastest known algorithms for (1+ε)-approximate k-means and k-median. Our algorithms have linear running time for a fixed k and ε. In addition, we can maintain the (1+ε)-approximate k-median or k-means clustering of a stream when points are being only inserted, using polylogarithmic space and update time.

Keywords

Cluster analysisDimension (graph theory)CombinatoricsSet (abstract data type)MathematicsPoint (geometry)Binary logarithmApproximation algorithmAlgorithmComputer scienceDiscrete mathematicsStatisticsGeometry

Affiliated Institutions

University of Illinois Urbana-Champaign US

Related Publications

How fast is the k-means method?

Sariel Har-Peled , Bardia Sadri

We present polynomial upper and lower bounds on the number of iterations performed by the k-means method (a.k.a. Lloyd's method) for k-means clustering. Our upper bounds are pol...

2005 Symposium on Discrete Algorithms 46 citations

A local search approximation algorithm for k-means clustering

Tapas Kanungo , David M. Mount , Nathan S. Netanyahu +3 more

In k-means clustering we are given a set of n data points in d-dimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to...

2002 497 citations

Better streaming algorithms for clustering problems

Moses Charikar , Liadan O'Callaghan , Rina Panigrahy‎

We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of stor...

2003 236 citations

Approximation schemes for clustering problems

W. Fernandez de la Véga , Marek Karpiński , Claire Kenyon +1 more

Let k be a fixed integer. We consider the problem of partitioning an input set of points endowed with a distance function into k clusters. We give polynomial time approximation ...

2003 158 citations

A Simple Linear Time (1+ &#8714;) -Approximation Algorithm for k-Means Clustering in Any Dimensions

Amit Kumar , Yogish Sabharwal , Subhankar Sen

We present the first linear time (1+ε)-approximation algorithm for the k-means problem for fixed k and ε. Our algorithm runs in O(nd) time, which is linear in the size of the in...

2004 241 citations

Publication Info

Year: 2004
Type: article
Pages: 291-300
Citations: 486
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

On coresets for k-means and k-median clustering

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

486

OpenAlex

Cite This

APA Style

                            
                                    Sariel Har-Peled, 
                                
                                    Soham Mazumdar
                                
                            (2004). 
                            On coresets for k-means and k-median clustering. 
                            
                            , 291-300.
                            https://doi.org/10.1145/1007352.1007400

Identifiers

DOI: 10.1145/1007352.1007400