true pt
A Brief Overview of Robust Clustering Techniques
Olfa Nasraoui
Department of Computer Engineering
& Computer Science
University of Louisville,
olfa.nasraoui_AT_louisville.edu
There are two major families of robust clustering methods. The first includes techniques which are directly based on robust statistics. Rousseeuw extended the idea of robust estimators to clusters with his Medoid
algorithm [RL87]. However this
extension is based on the Median estimator (also known as regression) which
minimizes the sum of absolute (instead of
squared) residuals. For the case of estimating the parameters of a regression
model, this estimator is resistant to outliers
in the dependent variable, and enjoys a breakdown point for location estimation. However, for regression with multivariate
data, it is no better than regular LS in terms of its breakdown point (),
because of its sensitivity to leverage points. The Generalized MVE (GMVE) estimator [JMB91] searches for clusters one at a time by repeatedly performing MVE followed by onestep of RLS after fixing the weights to 1 if a data point's Mahalanobis distance from the center is less than
and 0 otherwise. In each pass, the KolmogorovSmirnov normality test is used to validate the fit of each cluster found before removing from the data set the points with weights equal to 1 in that cluster. Instead of fixing the fraction of inliers, , to as in the original MVE, GMVE searches all possible values of within a prespecified interval with a suitable step size, to find each cluster. Hence, it is computationally quite expensive. GMVE has the advantage that the number of clusters need not be known in advance. However, it can not handle overlapping clusters.
The second family of robust clustering algorithms is based on modifying the objective of FCM to make the parameter estimates more resistant to noise. The first such attempt was presented by Ohashi [Oha84] who introduced the idea of a noise cluster indicated by the subscript (*), and modified the FCM criterion as follows

(1) 
where is a parameter related to scale. Later, in an independent effort, Davé proposed a Noise Clustering (NC) technique [Dav91] which uses a criterion similar to Ohashi's,

(2) 
Both of these methods are easy to optimize. However, they had two major shortcomings. First, a single parameter is used to describe the scale or resolution parameter. This is clearly insufficient in the general clustering problem where clusters are not guaranteeed to be of the same size or to have the same inlier bounds. Second, the scale parameter needs to be known in advance, or preestimated from the data.
The Possibilistic Means (PCM) family of clustering algorithms was proposed by Krishnapuram and Keller to alleviate the noise problem by relaxing the constraint on memberships used in FCM [KK93,KK94]. It uses the objective function
where are suitable resolution parameters which are allowed to vary from cluster to cluster, hence allowing clusters to have different sizes or inlier bounds. While the parameter update equations remain identical to those of FCM, it is easy to show [KK93] that may be a global minimum of
only if the memberships are updated by

(4) 
Unlike the constrained FCM memberships, the Possibilistic memberships
can be viewed as typicality values which measure how typical a point is of a given cluster, regardless of all other clusters.
It has been shown [NK95] that the PCM can be considered as a robust estimator representing independent , and estimators [Hub81,RL87], with being the scale parameter related to the spread of cluster , and being a parameter that determines the shape of the weight function. Since PCM tries to find the best clusters independently of each other, it is possible that identical clusters minimize the PCM objective function, while the remaining clusters are missed. Another problem that has proved to be a serious bottleneck for the PCM is that its performance relies heavily on a good initialization of the cluster parameters, and accurate estimates of . When the initial prototype parameters or the resolution parameter are inaccurate, the PCM can converge to a meaningless solution. In [KK93], it was suggested that the FCM algorithm be used to obtain the initial estimates of the prototypes, as well as the . Several methods to estimate the from the initial partition were proposed [KK93]. However, these methods can fail in the presence of noise, because if the data is noisy, the initial partition from the FCM may be too poor to yield good estimates of the . Nasraoui and Krishnapuram used robust scale estimates along the lines of those used in robust statistics to initilialize the PCM in [NK96]. However, robust scale estimates still do not solve the problem of the dependence of these estimates on the accuracy of the initial prototype parameters. A better strategy to reestimate the in each iteration was also proposed in [NK96], albeit at a high computational cost, by using high breakdown scale estimates such as the median and the median of absolute deviations (MAD) [Hub81,RL87]. Also, it was recommended [NK95] that the membership function be modified so that it becomes redescending, i.e., it drops to zero beyond a finite rejection point.
Beni and Liu [BL94] proposed a Least Biased Fuzzy Clustering technique which minimizes for each cluster the clustering entropy defined by
, subject to the assumption that the centers are unbiased, i.e.,
.
The optimal centers are identical to those of the FCM, while the memberships are given by

(5) 
where is a scale or resolution parameter resulting from the Lagrange multipliers used in the minimization process, and is the norm. This approach, like NC, suffers from the fact that a single scale parameter is used for all clusters, and this parameter has to be prespecified. It is also limited by the distance measure which is only suitable for spheroidal clusters.
Recently, Frigui and Krishnapuram [FK95] proposed the Robust Prototypes (RCP) which explicitly incorporated M and Westimators into the FCM objective function as follows

(6) 
subject to the same constraints on the memberships as in the FCM. The optimal prototype parameters of the th cluster are derived by setting
resulting in a system of equations for each cluster similar to the Westimator in (), except that the contribution of each data sample inside the sum is multiplied by . RCP offers several innovative concepts such as the use of both a set of constrained fuzzy memberships to describe the degree of belonging of a data point to different clusters and a set of unconstrained robust weights for each cluster, that describe the adherence of a data point to the fit of that cluster as in Westimators. However, RCP uses the Median and MAD to estimate the scale or resolution parameter used in the definition of the weights of each cluster. This requires a heavy computational cost, in addition to assuming that the noise proportion is at the most in each cluster in each iteration.
Yager and Filev [YF94] proposed the mountain method which considers each data point as a candidate cluster center and chooses as an optimal cluster center, the data point at which a densitylike function,
, is maximized, where is a resolution parameter. The algorithm proceeds by identifying one cluster at a time, and removing identified clusters by discounting their contribution to function
of all the remaining points. The mountain method suffers from a very high polynomial complexity,
. It is also limited by its use of a single scale parameter, , for all clusters, that is assumed to be known in advance.
Davé and Krishnapuram [Da97] show the relation between many of the abovementionned clustering algorithms, and discuss the role of validity in robust clustering.
Kim et al. [KKD95] proposed the Fuzy C Trimmed Prototypes
algorithm (FCTP) which is based on the reformulated objective function of the FCM.
The reformulated objective function of FCM is

(7) 
where
, and
.
The FTCP objective function is
where and
.
They propose a heuristic algorithm to minimize and estimate .
However, this procedure assumes a good initialization.
Choi and Krishnapuram [CK96] proposed a robust version
of FCM based on the reformulated objective function of FCM.
The objective function of this algorithm is given by
This algorithm uses only one robust weight per point as opposed to
weights in the formulation of (6). The weight is interpreted as the goodness of the point.
This algorithm works only for spheroidal clusters.
The choice of the loss function and the weights are not discussed
in this paper.
We conclude from our review of existing robust estimation and clustering methods that they all suffer from the fact that they either depend on prespecified values for the scale parameters or the fraction of inliers, which makes them very sensitive to initialization; or they have to perform a quasiexhaustive search on these parameters, which makes them require a very high computational cost.
This work is supported by the National Science Foundation (CAREER Award IIS0133948 to O. Nasraoui).
Partial support of earlier stages of this work by the Office of Naval Research grant
(N0000149610439) and by the National Science Foundation Grant
IIS 9800899 is also gratefully acknowledged.
 BL94

G. Beni and X. Liu.
A least biased fuzzy clustering method.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
16(9):954960, Sep. 1994.
 CK96

Y. Choi and R. Krishnapuram.
Fuzzy and robust formulation of maximumlikelihoodbased gaussian
mixture decomposition.
In IEEE Conference on Fuzzy Systems, pages 18991905, New
Orleans, Sep. 1996.
 Da97

R. N. Davé and R. Krishnapuram and.
Robust clustering methods: A unified view.
IEEE Trans. Fuzzy Syst., 5(2):270293, 1997.
 Dav91

R. N. Davé.
Characterization and detection of noise in clustering.
Pattern Recognition Letters, 12(11):657664, 1991.
 FK95

H. Frigui and R. Krishnapuram.
A robust clustering algorithm based on the mestimator.
In Neural, Parallel and Scientific Computations, Atlanta,
Georgia, May 1995.
 Hub81

P. J. Huber.
Robust Statistics.
John Wiley & Sons, New York, 1981.
 JMB91

J. M. Jolion, P. Meer, and S. Bataouche.
Robust clustering with applications in computer vision.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
13(8):791802, Aug. 1991.
 KK93

R. Krishnapuram and J. M. Keller.
A possibilistic approach to clustering.
IEEE Trans. Fuzzy Syst., 1(2):98110, May 1993.
 KK94

R. Krishnapuram and J. M. Keller.
Fuzzy and possibilistic clustering methods for computer vision.
In S. Mitra, M. Gupta, and W. Kraske, editors, Neural and Fuzzy
Systems, pages 135159. SPIE Institute Series, 1994.
 KKD95

J. Kim, R. Krishnapuram, and R. N. Davé.
On robustifying the cmeans algorithms.
In NAFIPS/ISUMA, pages 630635, College Park, MD, Sep. 1995.
 NK95

O. Nasraoui and R. Krishnapuram.
Crisp interpretation of fuzzy and possibilistic clustering
algorithms.
In 3rd European Congress on Intelligent Techniques and Soft
Computing, volume 3, pages 13121318, Syracuse NY, Aug. 1995.
 NK96

O. Nasraoui and R. Krishnapuram.
An improved possibilistic cmeans algorithm with finite rejection and
robust scale estimation.
In North American Fuzzy Information Processing Society
Conference, Berkeley, California, June. 1996.
 Oha84

Y. Ohashi.
Fuzzy clustering and robust estimation.
Presentation at the 9th meeting of SAS Users Group
International, 1984.
 RL87

P. J. Rousseeuw and A. M. Leroy.
Robust Regression and Outlier Detection.
John Wiley & Sons, New York, 1987.
 YF94

R. R. Yager and D. P. Filev.
Approximate clustering via the mountain method.
SMC, 24(8):12791284, 1994.
A Brief Overview of Robust Clustering Techniques
This document was generated using the
LaTeX2HTML translator Version 2002 (1.62)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html split 0 image_type gif RobustClustering.tex