Modelling Relational Data using Bayesian Clustered Tensor Factorization

Abstract

We consider the problem of learning probabilistic models for complex relational structures between various types of objects. A model can help us “understand ” a dataset of relational facts in at least two ways, by finding interpretable structure in the data, and by supporting predictions, or inferences about whether particular unobserved relations are likely to be true. Often there is a tradeoff between these two aims: cluster-based models yield more easily interpretable representations, while factorization-based approaches have given better predictive performance on large data sets. We introduce the Bayesian Clustered Tensor Factorization (BCTF) model, which embeds a factorized representation of relations in a nonparametric Bayesian clustering framework. Inference is fully Bayesian but scales well to large data sets. The model simultaneously discovers interpretable clusters and yields predictive performance that matches or beats previous probabilistic models for relational data. 1

Keywords

Computer scienceStatistical relational learningCluster analysisInferenceBayesian probabilityArtificial intelligenceMachine learningRelational databaseProbabilistic logicData miningBayesian inferenceRepresentation (politics)FactorizationAlgorithm

Affiliated Institutions

Related Publications

Probabilistic graphical models : principles and techniques

Daniel L. Koller , Nir Friedman

Most tasks require a person or an automated system to reason -- to reach conclusions based on available information. The framework of probabilistic graphical models, presented i...

2009 6434 citations

Introduction to Statistical Relational Learning

Lise Getoor , Ben Taskar

Advanced statistical modeling and knowledge representation techniques for a newly emerging area of machine learning and probabilistic reasoning; includes introductory material, ...

2007 The MIT Press eBooks 1630 citations

A latent factor model for highly multi-relational data

Rodolphe Jenatton , Nicolas Le Roux , Antoine Bordes +1 more

Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relations between entities. While there is a large b...

2012 HAL (Le Centre pour la Communication ... 366 citations

MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS

Frank E. Harrell , Kerry L. Lee , Daniel B. Mark

Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variabl...

1996 Statistics in Medicine 9497 citations

On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)

Scott Lundberg , Su‐In Lee

In an industrial maintenance context, degradation diagnosis is the problem of determining the current level of degradation of operating machines based on measurements. With the ...

2024 Dagstuhl Research Online Publication ... 12892 citations

Publication Info

Year: 2009
Type: article
Volume: 22
Pages: 1821-1828
Citations: 241
Access: Closed

External Links

Citation Metrics

241

OpenAlex

Cite This

APA Style

                            
                                    Ilya Sutskever, 
                                
                                    Joshua B. Tenenbaum, 
                                
                                    Ruslan Salakhutdinov
                                
                            (2009). 
                            Modelling Relational Data using Bayesian Clustered Tensor Factorization. 
                            DSpace@MIT (Massachusetts Institute of Technology)
                            , 22
                            
                            , 1821-1828.