Abstract

We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in co-occurrence data. In this scheme, multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables. To implement this idea, we propose an algorithm that interleaves top-down clustering of some variables and bottom-up clustering of the other variables, with a local optimization correction routine. Focusing on document clustering we present an extensive empirical study of two-way, three-way and four-way applications of our scheme using six real-world datasets including the 20 News-groups (20NG) and the Enron email collection. Our multi-way distributional clustering (MDC) algorithms consistently and significantly outperform previous state-of-the-art information theoretic clustering algorithms.

Keywords

Cluster analysisPairwise comparisonComputer scienceData miningCorrelation clusteringConstrained clusteringCURE data clustering algorithmScheme (mathematics)Artificial intelligenceMachine learningMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2005
Type
article
Pages
41-48
Citations
104
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

104
OpenAlex

Cite This

Ron Bekkerman, Ran El‐Yaniv, Andrew McCallum (2005). Multi-way distributional clustering via pairwise interactions. , 41-48. https://doi.org/10.1145/1102351.1102357

Identifiers

DOI
10.1145/1102351.1102357