Abstract
A number of real-world domains such as social networks and e-commerce involve heterogeneous data that describes relations between multiple classes of entities.Understanding the natural structure of this type of heterogeneous relational data is essential both for exploratory analysis and for performing various predictive modeling tasks.In this paper, we propose a principled multi-way clustering framework for relational data, wherein different types of entities are simultaneously clustered based not only on their intrinsic attribute values, but also on the multiple relations between the entities.To achieve this, we introduce a relation graph model that describes all the known relations between the different entity classes, in which each relation between a given set of entity classes is represented in the form of multi-modal tensor over an appropriate domain.Our multi-way clustering formulation is driven by the objective of capturing the maximal "information" in the original relation graph, i.e., accurately approximating the set of tensors corresponding to the various relations.This formulation is applicable to all Bregman divergences (a broad family of loss functions that includes squared Euclidean distance, KL-divergence), and also permits analysis of mixed data types using convex combinations of appropriate Bregman loss functions.Furthermore, we present a large family of structurally different multi-way clustering schemes that preserve various linear summary statistics of the original data.We accomplish the above generalizations by extending a recently proposed key theoretical result, namely the minimum Bregman information principle [1], to the relation graph setting.We also describe an efficient multi-way clustering algorithm based on alternate minimization that generalizes a number of other recently proposed clustering methods.Empirical results on datasets obtained from real-world domains (e.g., movie recommendations, newsgroup articles) demonstrate the generality and efficacy of our framework.
Keywords
Affiliated Institutions
Related Publications
Learning Eigenfunctions Links Spectral Embedding and Kernel PCA
In this letter, we show a direct relation between spectral embedding methods and kernel principal components analysis and how both are special cases of a more general learning p...
The Development of Constructivist Grounded Theory
Constructivist grounded theory is a popular method for research studies primarily in the disciplines of psychology, education, and nursing. In this article, the authors aim to l...
Graph embedding and extensions: a general framework for dimensionality reduction.
Over the past few decades, a large family of algorithms - supervised or unsupervised; stemming from statistics or geometry theory - has been designed to provide different soluti...
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
Oligonucleotide arrays can provide a broad picture of the state of the cell, by monitoring the expression level of thousands of genes at the same time. It is of interest to deve...
Saturation in qualitative research: exploring its conceptualization and operationalization
Saturation has attained widespread acceptance as a methodological principle in qualitative research. It is commonly taken to indicate that, on the basis of the data that have be...
Publication Info
- Year
- 2007
- Type
- article
- Pages
- 145-156
- Citations
- 125
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1137/1.9781611972771.14