Empirical Observations of Probabilistic Heuristics for the Clustering Problem
Title | Empirical Observations of Probabilistic Heuristics for the Clustering Problem |
Publication Type | Technical Report |
Year of Publication | 1997 |
Authors | Bilmes, J. A., Vahdat A., Hsu W., & Im E-J. |
Other Numbers | 1084 |
Abstract | We empirically investigate a number of strategies for solving the clustering problem under the minimum variance error criterion. First, we compare the behavior of four algorithms, 1) randomized minimum spanning tree, 2) hierarchical grouping, 3) randomized maximum cut, and 4) standard k-means. We test these algorithms with a large corpus of both contrived and real-world data sets and find that standard k-means performs best. We found, however, that standard k-means can, with non-negligible probability, do a poor job optimizing the minimum variance criterion. We therefore investigate various randomized k-means modifications. We empirically find that by running randomized k-means only a modest number of times, the probability of a poor solution becomes negligible. Using a large number of CPU hours to experimentally derive the apparently optimal solutions, we also find that randomized k-means has the best rate of convergence to this apparent optimum. |
URL | http://www.icsi.berkeley.edu/ftp/global/pub/techreports/1997/tr-97-018.pdf |
Bibliographic Notes | ICSI Technical Report TR-97-018 |
Abbreviated Authors | J. Bilmes, A. Vahdat, W. Hsu, and E.-J. Im |
ICSI Publication Type | Technical Report |