Abstract
A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters. To provide a variety of clustering solutions, the data sets were analyzed by four hierarchical clustering methods. External criterion measures indicated excellent recovery of the true cluster structure by the methods at the correct hierarchy level. Thus, the clustering present in the data was quite strong. The simulation results for the stopping rules revealed a wide range in their ability to determine the correct number of clusters in the data. Several procedures worked fairly well, whereas others performed rather poorly. Thus, the latter group of rules would appear to have little validity, particularly for data sets containing distinct clusters. Applied researchers are urged to select one or more of the better criteria. However, users are cautioned that the performance of some of the criteria may be data dependent.
Keywords
Affiliated Institutions
Related Publications
A Method for Comparing Two Hierarchical Clusterings
Abstract This article concerns the derivation and use of a measure of similarity between two hierarchical clusterings. The measure, Bk , is derived from the matching matrix, [mi...
Bayesian Clustering Using Hidden Markov Random Fields in Spatial Population Genetics
Abstract We introduce a new Bayesian clustering algorithm for studying population structure using individually geo-referenced multilocus data sets. The algorithm is based on the...
Latent Class Model Diagnosis
Summary. In many areas of medical research, such as psychiatry and gerontology, latent class variables are used to classify individuals into disease categories, often with the i...
Comparing three classification strategies for use in ecology
Abstract. We compare three common types of clustering algorithms for use with community data. TWINSPAN is divisive hierarchical, flexible‐UPGMA is agglomerative and hierarchical...
Stability-Based Validation of Clustering Solutions
Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract “natural” group structure in data. Such groupings need to be validated ...
Publication Info
- Year
- 1985
- Type
- article
- Volume
- 50
- Issue
- 2
- Pages
- 159-179
- Citations
- 3811
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1007/bf02294245