The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

Abstract

The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algorithm has certain limitations, including problems associated with random initialization of the centroids which leads to unexpected convergence. Additionally, such a clustering algorithm requires the number of clusters to be defined beforehand, which is responsible for different cluster shapes and outlier effects. A fundamental problem of the k-means algorithm is its inability to handle various data types. This paper provides a structured and synoptic overview of research conducted on the k-means algorithm to overcome such shortcomings. Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. The detailed experimental analysis along with a thorough comparison among different k-means clustering algorithms differentiates our work compared to other existing survey papers. Furthermore, it outlines a clear and thorough understanding of the k-means algorithm along with its different research directions.

Keywords

Cluster analysisComputer scienceData miningAlgorithmConvergence (economics)InitializationCentroidOutlierPopularityVariety (cybernetics)k-means clusteringMachine learningArtificial intelligence

Affiliated Institutions

Related Publications

Scaling clustering algorithms to large databases

Patricia Bradley , Usama M. Fayyad , Cory Reina

Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable cluste...

1998 707 citations

k-means++: the advantages of careful seeding

David Arthur , Sergei Vassilvitskii

The k-means method is a widely used clustering technique that seeks to minimize the average squared distance between points in the same cluster. Although it offers no accuracy g...

2007 Symposium on Discrete Algorithms 6278 citations

Weighted Graph Cuts without Eigenvectors A Multilevel Approach

Inderjit S. Dhillon , Yuqiang Guan , Brian Kulis

A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods....

2007 IEEE Transactions on Pattern Analysis... 1016 citations

Unsupervised K-Means Clustering Algorithm

Kristina P. Sinaga , Miin‐Shen Yang

The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsuper...

2020 IEEE Access 1917 citations

Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster

Muhammad Ali Syakur , Bain Khusnul Khotimah , Eka Mala Sari Rochman +1 more

Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained ...

2018 IOP Conference Series Materials Scien... 1104 citations

Publication Info

Year: 2020
Type: article
Volume: 9
Issue: 8
Pages: 1295-1295
Citations: 1335
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1335

OpenAlex

Cite This

APA Style

                            
                                    Mohiuddin Ahmed, 
                                
                                    Raihan Seraj, 
                                
                                    Syed Mohammed Shamsul Islam
                                
                            (2020). 
                            The k-means Algorithm: A Comprehensive Survey and Performance Evaluation. 
                            Electronics
                            , 9
                            (8)
                            , 1295-1295.
                            https://doi.org/10.3390/electronics9081295

Identifiers

DOI: 10.3390/electronics9081295