Abstract

A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities.Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for performing various predictive modeling tasks.In this paper, we propose a principled multi-way clustering framework for relational data, wherein different types of entities are simultaneously clustered based not only on their intrinsic attribute values, but also on the multiple relations between the entities.To achieve this, we introduce a relation graph model that describes all the known relations between the different entity classes, in which each relation between a given set of entity classes is represented in the form of multi-modal tensor over an appropriate domain.Our multi-way clustering formulation is driven by the objective of capturing the maximal "information" in the original relation graph, i.e., accurately approximating the set of tensors corresponding to the various relations.This formulation is applicable to all Bregman divergences (a broad family of loss functions that includes squared Euclidean distance, KL-divergence), and also permits analysis of mixed data types using convex combinations of appropriate Bregman loss functions.Furthermore, we present a large family of structurally different multi-way clustering schemes that preserve various linear summary statistics of the original data.We accomplish the above generalizations by extending a recently proposed key theoretical result, namely the minimum Bregman information principle [1], to the relation graph setting.We also describe an efficient multi-way clustering algorithm based on alternate minimization that generalizes a number of other recently proposed clustering methods.Empirical results on datasets obtained from real-world domains (e.g., movie recommendations, newsgroup articles) demonstrate the generality and efficacy of our framework.

Keywords

Bregman divergenceCluster analysisRelation (database)Theoretical computer scienceComputer scienceGraphMathematicsData miningArtificial intelligence

Affiliated Institutions

Related Publications

Publication Info

Year
2007
Type
article
Pages
145-156
Citations
125
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

125
OpenAlex

Cite This

Arindam Banerjee, Sugato Basu, Srujana Merugu (2007). Multi-way Clustering on Relation Graphs. , 145-156. https://doi.org/10.1137/1.9781611972771.14

Identifiers

DOI
10.1137/1.9781611972771.14