Abstract

We investigate variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration

Keywords

Computer scienceType (biology)Geology

Affiliated Institutions

Related Publications

How fast is the k-means method?

We present polynomial upper and lower bounds on the number of iterations performed by the k-means method (a.k.a. Lloyd's method) for k-means clustering. Our upper bounds are pol...

2005 Symposium on Discrete Algorithms 46 citations

On clusterings

We motivate and develop a natural bicriteria measure for assessing the quality of a clustering that avoids the drawbacks of existing measures. A simple recursive heuristic is sh...

2004 Journal of the ACM 842 citations

Publication Info

Year
2006
Type
article
Pages
165-176
Citations
265
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

265
OpenAlex

Cite This

Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman et al. (2006). The Effectiveness of Lloyd-Type Methods for the k-Means Problem. , 165-176. https://doi.org/10.1109/focs.2006.75

Identifiers

DOI
10.1109/focs.2006.75