Multi-way distributional clustering via pairwise interactions

Abstract

We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in co-occurrence data. In this scheme, multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables. To implement this idea, we propose an algorithm that interleaves top-down clustering of some variables and bottom-up clustering of the other variables, with a local optimization correction routine. Focusing on document clustering we present an extensive empirical study of two-way, three-way and four-way applications of our scheme using six real-world datasets including the 20 News-groups (20NG) and the Enron email collection. Our multi-way distributional clustering (MDC) algorithms consistently and significantly outperform previous state-of-the-art information theoretic clustering algorithms.

Keywords

Cluster analysisPairwise comparisonComputer scienceData miningCorrelation clusteringConstrained clusteringCURE data clustering algorithmScheme (mathematics)Artificial intelligenceMachine learningMathematics

Affiliated Institutions

Related Publications

Scaling clustering algorithms to large databases

Patricia Bradley , Usama M. Fayyad , Cory Reina

Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable cluste...

1998 707 citations

Multi-way Clustering on Relation Graphs

Arindam Banerjee , Sugato Basu , Srujana Merugu

A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities.Understanding the ...

2007 125 citations

Information-theoretic co-clustering

Inderjit S. Dhillon , Subramanyam Mallela , Dharmendra S. Modha

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in continge...

2003 361 citations

Comment-based multi-view clustering of web 2.0 items

Xiangnan He , Min‐Yen Kan , Peichu Xie +1 more

Clustering Web 2.0 items (i.e., web resources like videos, images) into semantic groups benefits many applications, such as organizing items, generating meaningful tags and impr...

2014 99 citations

Multiple sequence alignment with hierarchical clustering

F. Corpet

An algorithm is presented for the multiple alignment of sequences, either proteins or nucleic acids, that is both accurate and easy to use on microcomputers. The approach is bas...

1988 Nucleic Acids Research 5327 citations

Publication Info

Year: 2005
Type: article
Pages: 41-48
Citations: 104
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Multi-way distributional clustering via pairwise interactions

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

104

OpenAlex

Cite This

APA Style

                            
                                    Ron Bekkerman, 
                                
                                    Ran El‐Yaniv, 
                                
                                    Andrew McCallum
                                
                            (2005). 
                            Multi-way distributional clustering via pairwise interactions. 
                            
                            , 41-48.
                            https://doi.org/10.1145/1102351.1102357

Identifiers

DOI: 10.1145/1102351.1102357