Multi-way Clustering on Relation Graphs

Abstract

A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities.Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for performing various predictive modeling tasks.In this paper, we propose a principled multi-way clustering framework for relational data, wherein different types of entities are simultaneously clustered based not only on their intrinsic attribute values, but also on the multiple relations between the entities.To achieve this, we introduce a relation graph model that describes all the known relations between the different entity classes, in which each relation between a given set of entity classes is represented in the form of multi-modal tensor over an appropriate domain.Our multi-way clustering formulation is driven by the objective of capturing the maximal "information" in the original relation graph, i.e., accurately approximating the set of tensors corresponding to the various relations.This formulation is applicable to all Bregman divergences (a broad family of loss functions that includes squared Euclidean distance, KL-divergence), and also permits analysis of mixed data types using convex combinations of appropriate Bregman loss functions.Furthermore, we present a large family of structurally different multi-way clustering schemes that preserve various linear summary statistics of the original data.We accomplish the above generalizations by extending a recently proposed key theoretical result, namely the minimum Bregman information principle [1], to the relation graph setting.We also describe an efficient multi-way clustering algorithm based on alternate minimization that generalizes a number of other recently proposed clustering methods.Empirical results on datasets obtained from real-world domains (e.g., movie recommendations, newsgroup articles) demonstrate the generality and efficacy of our framework.

Keywords

Bregman divergenceCluster analysisRelation (database)Theoretical computer scienceComputer scienceGraphMathematicsData miningArtificial intelligence

Affiliated Institutions

Related Publications

Learning Eigenfunctions Links Spectral Embedding and Kernel PCA

Yoshua Bengio , Olivier Delalleau , Nicolas Le Roux +3 more

In this letter, we show a direct relation between spectral embedding methods and kernel principal components analysis and how both are special cases of a more general learning p...

2004 Neural Computation 335 citations

The Development of Constructivist Grounded Theory

Jane Mills , Ann Bonner , Karen Francis

Constructivist grounded theory is a popular method for research studies primarily in the disciplines of psychology, education, and nursing. In this article, the authors aim to l...

2006 International Journal of Qualitative ... 1568 citations

Graph embedding and extensions: a general framework for dimensionality reduction.

Dong Xu , Benyu Zhang , Hongjiang Zhang +1 more

Over the past few decades, a large family of algorithms - supervised or unsupervised; stemming from statistics or geometry theory - has been designed to provide different soluti...

2007 PubMed 381 citations

Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays

Uri Alon , Naama Barkai , Daniel A. Notterman +4 more

Oligonucleotide arrays can provide a broad picture of the state of the cell, by monitoring the expression level of thousands of genes at the same time. It is of interest to deve...

1999 Proceedings of the National Academy o... 4182 citations

Saturation in qualitative research: exploring its conceptualization and operationalization

Benjamin Saunders , Julius Sim , Tom Kingstone +5 more

Saturation has attained widespread acceptance as a methodological principle in qualitative research. It is commonly taken to indicate that, on the basis of the data that have be...

2017 Quality & Quantity 10074 citations

Publication Info

Year: 2007
Type: article
Pages: 145-156
Citations: 125
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Multi-way Clustering on Relation Graphs

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

125

OpenAlex

Cite This

APA Style

                            
                                    Arindam Banerjee, 
                                
                                    Sugato Basu, 
                                
                                    Srujana Merugu
                                
                            (2007). 
                            Multi-way Clustering on Relation Graphs. 
                            
                            , 145-156.
                            https://doi.org/10.1137/1.9781611972771.14

Identifiers

DOI: 10.1137/1.9781611972771.14