Knowledge Discovery & Web Mining Lab, University of Louisville

NSF CAREER: New Clustering Algorithms Based on Robust Estimation and Genetic Niches

with Applications to Web Usage Mining


  Goals, Objectives and Targeted Activities

  Selected Developments


Area Background

Data sets
Outreach Activities

This project is supported by National Science Foundation under grant NSF IIS-0533317


Selected Developments

We developed a statistically robust and efficient algorithm for clustering and generalized it to multivariate data. Hierarchical clustering based on genetic niching showed that it is a feasible technique and promises more success compared to other techniques, for clustering web sessions. We have also developed new recommendation techniques based on Web usage mining and fuzzy approximate reasoning to take into account the inherent uncertainties in all the phases of Web personalization (see publications).

We have presented a systematic approach to automatic web recommender systems based on Web usage mining in a first stage to learn user profiles, and a second data mining phase that is devoted to learning an accurate model for predicting user requests. Our approach differs from existing methods because it includes two separate learning phases: one to learn the user profiles, and another to learn a recommendation model. Most previous approaches do not include adaptive learning in a separate second phase, and instead base the recommendations on simplistic assumptions (e.g. nearest profile recommendations, or deployment of pre-discovered association rules). We have also recently developed a new approach to generate simultaneously accurate and complete recommendations, called Context Ultra-Sensitive Approach based on two-step Recommender systems (CUSA-2-step-Rec). Our approach relies on a committee of profile-specific neural networks. This approach provides recommendations that are accurate and fast to train because only the URLs relevant to a specific profile are used to define the architecture of each network. Similar to the task of completing the missing pieces of a puzzle, each neural network is trained to predict the missing URLs of several complete ground-truth sessions from a given profile, given as input several incomplete subsessions. We compared the proposed approach with K-NN based collaborative filtering, as well as other classical methods (such as nearest profile recommendations ) showing that our approach achieves higher coverage and precision while being faster, and requiring lower main memory at recommendation time. While most recommenders are inherently context sensitive, our approach is context ultra-sensitive because a different recommendation model is designed for each profile separately.

On a separate track, we have continued our investigation into evolutionary clustering using Unsupervised Niche Clustering (UNC), achieving significant strides, specifically in (i)  performing a comprehensive evaluation of UNC under different density, noise and cluster distribution conditions, (ii) successfully applying UNC to the problem of anomaly detection for network intrusion detection, by using UNC to discover normal profiles, (iii) developing scalable models of UNC to cluster large volumes of data under stringent time and memory constraints, (iv) developing a parameter-free model of UNC that eliminates the need to specify Genetic Algorithm (GA) parameters such as crossover and mutation probabilities, and finally developing a flexible and generalized framework for evolutionary clustering based on  exchangeable modules that affect crucial mechanisms such as the selection strategy, replacement strategy, niching strategy, hybrid scale estimation, automatic adaptation of the GA parameters, and hybrid local refinement post evolution.

We have also investigated several approaches to scalable evolutionary clustering of noisy and evolving data sets based on more recently understood Evolutionary phenomena, such a new artificial immune system model, multiploid and structured chromosome representations, and gene expression mechanisms. Preliminary results have shown a promise of success. We believe that creating hybrids that tap on the strengths of these different facets of evolutionary computation, as well as the Unsupervised Niche Clustering algorithm will naturally evolve into beneficial outcomes of this project to the data mining community. Tangible products of this research will be a suite of scalable, autonomous, and reliable clustering algorithms for intensive applications such as web and stream data mining.


Go to back to Knowledge Discovery & Web Mining Lab