Abstract
Abstract Many intuitively appealing methods have been suggested for clustering data, however, interpretation of their results has been hindered by the lack of objective criteria. This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data. These criteria depend on a measure of similarity between two different clusterings of the same set of data; the measure essentially considers how each pair of data points is assigned in each clustering.
Keywords
Affiliated Institutions
Related Publications
Stability-Based Validation of Clustering Solutions
Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract “natural” group structure in data. Such groupings need to be validated ...
Model-Based Clustering, Discriminant Analysis, and Density Estimation
Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonab...
Kernel k-means
Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have ...
Search and clustering orders of magnitude faster than BLAST
Abstract Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAS...
Principal component analysis
Abstract Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter‐correlated quantitative d...
Publication Info
- Year
- 1971
- Type
- article
- Volume
- 66
- Issue
- 336
- Pages
- 846-846
- Citations
- 1823
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.2307/2284239