Clustering by Passing Messages Between Data Points

Brendan J. Frey; Delbert Dueck

doi:10.1126/science.1136800

Abstract

Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called “affinity propagation,” which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We used affinity propagation to cluster images of faces, detect genes in microarray data, identify representative sentences in this manuscript, and identify cities that are efficiently accessed by airline travel. Affinity propagation found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time.

Keywords

Affinity propagationCluster analysisComputer scienceSimilarity (geometry)Data miningSet (abstract data type)Data setData pointCluster (spacecraft)Pattern recognition (psychology)Artificial intelligenceFuzzy clusteringCURE data clustering algorithmImage (mathematics)

Affiliated Institutions

University of Toronto CA

Related Publications

Approximation schemes for clustering problems

W. Fernandez de la Véga , Marek Karpiński , Claire Kenyon +1 more

Let k be a fixed integer. We consider the problem of partitioning an input set of points endowed with a distance function into k clusters. We give polynomial time approximation ...

2003 158 citations

CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure

Mattias Jakobsson , Noah A. Rosenberg

Abstract Motivation: Clustering of individuals into populations on the basis of multilocus genotypes is informative in a variety of settings. In population-genetic clustering al...

2007 Bioinformatics 6282 citations

REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms

Fran Supek , Matko Bošnjak , Nives Škunca +1 more

Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). Th...

2011 PLoS ONE 6566 citations

Kernel k-means

Inderjit S. Dhillon , Yuqiang Guan , Brian Kulis

Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have ...

2004 1184 citations

Large-Scale Clustering of cDNA-Fingerprinting Data

Ralf Herwig , Albert J. Poustka , Christine H. Müller +3 more

Clustering is one of the main mathematical challenges in large-scale gene expression analysis. We describe a clustering procedure based on a sequential k -means algorithm with a...

1999 Genome Research 219 citations

Publication Info

Year: 2007
Type: article
Volume: 315
Issue: 5814
Pages: 972-976
Citations: 6739
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Clustering by Passing Messages Between Data Points

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

6739

OpenAlex

Cite This

APA Style

                            
                                    Brendan J. Frey, 
                                
                                    Delbert Dueck
                                
                            (2007). 
                            Clustering by Passing Messages Between Data Points. 
                            Science
                            , 315
                            (5814)
                            , 972-976.
                            https://doi.org/10.1126/science.1136800

Identifiers

DOI: 10.1126/science.1136800