Unsupervised K-Means Clustering Algorithm

Abstract

The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsupervised learning to clustering in pattern recognition and machine learning, the k-means algorithm and its extensions are always influenced by initializations with a necessary number of clusters a priori. That is, the k-means algorithm is not exactly an unsupervised clustering method. In this paper, we construct an unsupervised learning schema for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters. That is, we propose a novel unsupervised k-means (U-k-means) clustering algorithm with automatically finding an optimal number of clusters without giving any initialization and parameter selection. The computational complexity of the proposed U-k-means clustering algorithm is also analyzed. Comparisons between the proposed U-k-means and other existing methods are made. Experimental results and comparisons actually demonstrate these good aspects of the proposed U-k-means clustering algorithm.

Keywords

Computer scienceCluster analysisArtificial intelligenceCanopy clustering algorithmUnsupervised learningPattern recognition (psychology)Correlation clusteringAlgorithm

Affiliated Institutions

Chung Yuan Christian University TW

Related Publications

Stability-Based Validation of Clustering Solutions

Tilman Lange , Volker Röth , Mikio L. Braun +1 more

Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract “natural” group structure in data. Such groupings need to be validated ...

2004 Neural Computation 508 citations

Weighted Graph Cuts without Eigenvectors A Multilevel Approach

Inderjit S. Dhillon , Yuqiang Guan , Brian Kulis

A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods....

2007 IEEE Transactions on Pattern Analysis... 1016 citations

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

Mohiuddin Ahmed , Raihan Seraj , Syed Mohammed Shamsul Islam

The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algori...

2020 Electronics 1335 citations

Kernel k-means

Inderjit S. Dhillon , Yuqiang Guan , Brian Kulis

Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have ...

2004 1184 citations

Deep Clustering for Unsupervised Learning of Visual Features

Mathilde Caron , Piotr Bojanowski , Armand Joulin +1 more

Clustering is a class of unsupervised learning methods that has been extensively applied and studied in computer vision. Little work has been done to adapt it to the end-to-end ...

2018 Lecture notes in computer science 2355 citations

Publication Info

Year: 2020
Type: article
Volume: 8
Pages: 80716-80727
Citations: 1917
Access: Closed

External Links

Download PDF (Free) View on DOI.org Semantic Scholar

Social Impact

Altmetric

Unsupervised K-Means Clustering Algorithm

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1917

OpenAlex

Influential

1658

CrossRef

Cite This

APA Style

                            
                                    Kristina P. Sinaga, 
                                
                                    Miin‐Shen Yang
                                
                            (2020). 
                            Unsupervised K-Means Clustering Algorithm. 
                            IEEE Access
                            , 8
                            
                            , 80716-80727.
                            https://doi.org/10.1109/access.2020.2988796

Identifiers

DOI: 10.1109/access.2020.2988796

Data Quality

Data completeness: 86%