Abstract

Abstract : The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967). However, as currently implemented, it does not allow the specification of which features (orientation, size and shape) are to be common to all clusters and which may differ between clusters. Also, it is restricted to Gaussian distributions and it does not allow for noise. We propose ways of overcoming these limitations. A reparameterization of the covariance matrix allows us to specify that some features, but not all, be the same for all clusters. A practical framework for non-Gaussian clustering is outlined, and a means of incorporating noise in the form of a Poisson process is described. An approximate Bayesian method for choosing the number of clusters is given. The performance of the proposed methods is studied by simulation, with encouraging results. The methods are applied to the analysis of a data set arising in the study of diabetes, and the results seem better than those of previous analyses. (RH)

Keywords

Cluster analysisComputer scienceGaussianCovarianceNoise (video)AlgorithmCovariance matrixDetermining the number of clusters in a data setGaussian processPoisson distributionSet (abstract data type)Bayesian probabilityData miningPattern recognition (psychology)MathematicsArtificial intelligenceStatisticsCorrelation clusteringCURE data clustering algorithmPhysics

Related Publications

Strong Consistency of $K$-Means Clustering

A random sample is divided into the $k$ clusters that minimise the within cluster sum of squares. Conditions are found that ensure the almost sure convergence, as the sample siz...

1981 The Annals of Statistics 452 citations

Publication Info

Year
1993
Type
article
Volume
49
Issue
3
Pages
803-803
Citations
2331
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2331
OpenAlex

Cite This

Jeffrey D. Banfield, Adrian E. Raftery (1993). Model-Based Gaussian and Non-Gaussian Clustering. Biometrics , 49 (3) , 803-803. https://doi.org/10.2307/2532201

Identifiers

DOI
10.2307/2532201