Abstract

Abstract This article concerns the derivation and use of a measure of similarity between two hierarchical clusterings. The measure, Bk , is derived from the matching matrix, [mij ], formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree. The mean and variance of Bk are determined under the assumption that the margins of [mij ] are fixed. Thus, Bk represents a collection of measures for k = 2, …, n – 1. (k, Bk ) plots are found to be useful in portraying the similarity of two clusterings. Bk is compared to other measures of similarity proposed respectively by Baker (1974) and Rand (1971). The use of (k, Bk ) plots for studying clustering methods is explored by a series of Monte Carlo sampling experiments. An example of the use of (k, Bk ) on real data is given.

Keywords

Similarity (geometry)Matching (statistics)MathematicsHierarchical clusteringTree (set theory)Measure (data warehouse)Variance (accounting)Sampling (signal processing)Cluster analysisMonte Carlo methodSeries (stratigraphy)StatisticsMatrix (chemical analysis)CombinatoricsData miningComputer scienceArtificial intelligenceChemistryChromatographyBiology

Related Publications

Hierarchical Clustering Schemes

Techniques for partitioning objects into optimally homogeneous groups on the basis of empirical measures of similarity among those objects have received increasing attention in ...

1967 Psychometrika 4785 citations

Publication Info

Year
1983
Type
article
Volume
78
Issue
383
Pages
553-569
Citations
1433
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1433
OpenAlex

Cite This

Edward B. Fowlkes, C. L. Mallows (1983). A Method for Comparing Two Hierarchical Clusterings. Journal of the American Statistical Association , 78 (383) , 553-569. https://doi.org/10.1080/01621459.1983.10478008

Identifiers

DOI
10.1080/01621459.1983.10478008