Abstract
Abstract : The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967). However, as currently implemented, it does not allow the specification of which features (orientation, size and shape) are to be common to all clusters and which may differ between clusters. Also, it is restricted to Gaussian distributions and it does not allow for noise. We propose ways of overcoming these limitations. A reparameterization of the covariance matrix allows us to specify that some features, but not all, be the same for all clusters. A practical framework for non-Gaussian clustering is outlined, and a means of incorporating noise in the form of a Poisson process is described. An approximate Bayesian method for choosing the number of clusters is given. The performance of the proposed methods is studied by simulation, with encouraging results. The methods are applied to the analysis of a data set arising in the study of diabetes, and the results seem better than those of previous analyses. (RH)
Keywords
Related Publications
Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison With Other Methods for Robust Gaussian Clustering
The two main topics of this paper are the introduction of the "optimally\ntuned improper maximum likelihood estimator" (OTRIMLE) for robust clustering\nbased on the multivariate...
Combining Mixture Components for Clustering
Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used....
Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data
AbstractClustering methods provide a powerful tool for the exploratory analysis of high-dimension, low–sample size (HDLSS) data sets, such as gene expression microarray data. A ...
An Examination of Procedures for Determining the Number of Clusters in a Data Set
A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlappi...
Strong Consistency of $K$-Means Clustering
A random sample is divided into the $k$ clusters that minimise the within cluster sum of squares. Conditions are found that ensure the almost sure convergence, as the sample siz...
Publication Info
- Year
- 1993
- Type
- article
- Volume
- 49
- Issue
- 3
- Pages
- 803-803
- Citations
- 2331
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.2307/2532201