Abstract

A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters. To provide a variety of clustering solutions, the data sets were analyzed by four hierarchical clustering methods. External criterion measures indicated excellent recovery of the true cluster structure by the methods at the correct hierarchy level. Thus, the clustering present in the data was quite strong. The simulation results for the stopping rules revealed a wide range in their ability to determine the correct number of clusters in the data. Several procedures worked fairly well, whereas others performed rather poorly. Thus, the latter group of rules would appear to have little validity, particularly for data sets containing distinct clusters. Applied researchers are urged to select one or more of the better criteria. However, users are cautioned that the performance of some of the criteria may be data dependent.

Keywords

Cluster analysisData miningComputer scienceSet (abstract data type)Range (aeronautics)Data setHierarchical clusteringCluster (spacecraft)HierarchyMonte Carlo methodVariety (cybernetics)StatisticsMathematicsArtificial intelligence

Affiliated Institutions

Related Publications

Publication Info

Year
1985
Type
article
Volume
50
Issue
2
Pages
159-179
Citations
3811
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

3811
OpenAlex

Cite This

Glenn W. Milligan, Martha C. Cooper (1985). An Examination of Procedures for Determining the Number of Clusters in a Data Set. Psychometrika , 50 (2) , 159-179. https://doi.org/10.1007/bf02294245

Identifiers

DOI
10.1007/bf02294245