true pt
Olfa Nasraoui
Department of Computer Engineering
& Computer Science
University of Louisville,
olfa.nasraoui_AT_louisville.edu
One major limitation of many classical clustering algorithms is that they assume that the number of clusters is known.
However, in practice, the number of clusters may not be known. This problem is sometimes called unsupervised clustering. Unsupervised prototype-based clustering aims at determining the correct number of clusters,
, without any prior knowledge about it, using one of four approaches. The first approach is to proceed by repeating the clustering for several
values at a high computational cost, and using a validity measure to choose the best partiton. The second approach is to perform several passes through the data set, seeking one cluster at a time and then removing from the data set of the next pass the points belonging to a found cluster if it passes a validity test as in the GMVE [JMB91]. The problem with these approaches lies in the difficulty in designing validity measures that can truly evaluate the goodness of fit of a given cluster or partition because they usually require setting thresholds that can widely vary in practice. Also, most validity measures either assume a known underlying inlier or noise distribution or are very sensitive to noise, and hence are not appropriate for general robust clustering. The third approach consists of starting the clustering process with an overspecified number of clusters, and then merging similar clusters and eliminating spurious clusters until the correct number of clusters is left as in Compatible Cluster Merging [KF92]. The fourth and most recent approach is based on Competitive Agglomeration [FK97], which starts by partitioning the data set into an overspecified number of clusters. Then, as the clustering progresses, adjacent clusters compete against each other for data points, and clusters that lose in the competition gradually become depleted and vanish.
This document was generated using the LaTeX2HTML translator Version 2002 (1.62)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -image_type gif UnsupervisedClustering.tex