Publication Details

Title: Empirical Observations of Probabilistic Heuristics for the Clustering Problem
Author: J. Bilmes, A. Vahdat, W. Hsu, and E.-J. Im
Group: ICSI Technical Reports
Date: May 1997
PDF: ftp://ftp.icsi.berkeley.edu/pub/techreports/1997/tr-97-018.pdf

Overview:
We empirically investigate a number of strategies for solving the clustering problem under the minimum variance error criterion. First, we compare the behavior of four algorithms, 1) randomized minimum spanning tree, 2) hierarchical grouping, 3) randomized maximum cut, and 4) standard k-means. We test these algorithms with a large corpus of both contrived and real-world data sets and find that standard k-means performs best. We found, however, that standard k-means can, with non-negligible probability, do a poor job optimizing the minimum variance criterion. We therefore investigate various randomized k-means modifications. We empirically find that by running randomized k-means only a modest number of times, the probability of a poor solution becomes negligible. Using a large number of CPU hours to experimentally derive the apparently optimal solutions, we also find that randomized k-means has the best rate of convergence to this apparent optimum.

Bibliographic Information:
ICSI Technical Report TR-97-018

Bibliographic Reference:
J. Bilmes, A. Vahdat, W. Hsu, and E.-J. Im. Empirical Observations of Probabilistic Heuristics for the Clustering Problem. ICSI Technical Report TR-97-018, May 1997