Empirical Observations of Probabilistic Heuristics for the Clustering Problem

TitleEmpirical Observations of Probabilistic Heuristics for the Clustering Problem
Publication TypeTechnical Report
Year of Publication1997
AuthorsBilmes, J. A., Vahdat A., Hsu W., & Im E-J.
Other Numbers1084

We empirically investigate a number of strategies for solving the clustering problem under the minimum variance error criterion. First, we compare the behavior of four algorithms, 1) randomized minimum spanning tree, 2) hierarchical grouping, 3) randomized maximum cut, and 4) standard k-means. We test these algorithms with a large corpus of both contrived and real-world data sets and find that standard k-means performs best. We found, however, that standard k-means can, with non-negligible probability, do a poor job optimizing the minimum variance criterion. We therefore investigate various randomized k-means modifications. We empirically find that by running randomized k-means only a modest number of times, the probability of a poor solution becomes negligible. Using a large number of CPU hours to experimentally derive the apparently optimal solutions, we also find that randomized k-means has the best rate of convergence to this apparent optimum.

Bibliographic Notes

ICSI Technical Report TR-97-018

Abbreviated Authors

J. Bilmes, A. Vahdat, W. Hsu, and E.-J. Im

ICSI Publication Type

Technical Report