Abstract
A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different methods--in particular, a general weighted kernel k-means objective is mathematically equivalent to a weighted graph clustering objective. We exploit this equivalence to develop a fast, high-quality multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, and ratio association criteria. This eliminates the need for any eigenvector computation for graph clustering problems, which can be prohibitive for very large graphs. Previous multilevel graph partitioning methods, such as Metis, have suffered from the restriction of equal-sized clusters; our multilevel algorithm removes this restriction by using kernel k-means to optimize weighted graph cuts. Experimental results show that our multilevel algorithm outperforms a state-of-the-art spectral clustering algorithm in terms of speed, memory usage, and quality. We demonstrate that our algorithm is applicable to large-scale clustering tasks such as image segmentation, social network analysis and gene network analysis.
Keywords
Affiliated Institutions
Related Publications
Kernel k-means
Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have ...
New spectral methods for ratio cut partitioning and clustering
Partitioning of circuit netlists in VLSI design is considered. It is shown that the second smallest eigenvalue of a matrix derived from the netlist gives a provably good approxi...
On the Quality of Spectral Separators
Computing graph separators is an important step in many graph algorithms. A popular technique for finding separators involves spectral methods. However, there has not been much ...
An efficient eigenvector approach for finding netlist partitions
A fast eigenvector technique for obtaining good initial node partitions of netlists for use in interchange heuristics is described. The method is based on approximating the netl...
On coresets for k-means and k-median clustering
In this paper, we show the existence of small coresets for the problems of computing k-median and k-means clustering for points in low dimension. In other words, we show that gi...
Publication Info
- Year
- 2007
- Type
- article
- Volume
- 29
- Issue
- 11
- Pages
- 1944-1957
- Citations
- 1016
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/tpami.2007.1115