We conduct research to advance state of the art in the area of Knowledge Discovery in Data sets (KDD), with an emphasis on Data Mining, and in particular Web Mining and Stream Data Mining. Our daily activities consist of learning, investigation, design, implementation, testing, and evaluation of efficient algorithms and techniques to solve challenging problems in support of a variety of applications, such as:
- Web analytics & Web personalization for e-commerce and information retrieval
- Mining evolving data streams with an emphasis on evolving Web clickstreams
- Scalable and/or personalized information retrieval in data such as text and astronomical data sets
Recently, the world has witnessed an explosion of electronically stored data. Most organizations rely on huge databases that contain a wealth of information which unfortunately, is not fully exploited. In fact the increasing size of most data repositories is making the access to useful information more and more difficult. Hence, the saying that “The mechanical production of data has created the need for a mechanical consumption of data” is not in vain. Data mining (DM) comprises the set of intelligent tools that can be used to extract useful or interesting information, such as patterns, associations, change, anomalies and significant structures, from large amounts of data stored in various information repositories. Data mining inherits a legacy from diverse disciplines, including:
- Machine Learning
- Artificial Intelligence
- Pattern Recognition
- Database Systems
- Information Retrieval
Recently, several particularly challenging areas of research in data mining have emerged, including:
- Web Mining: mining web data (semi-structured to unstructured)
- Text Mining: mining text data such as in Web pages and e-mails
- Stream Data Mining: mining data that arrives in huge quantities under extremely stringent memory constraints, making it necessary to process the data in only one sequential direction (ex: clickstream data)
- Mining Evolving Data Streams: mining data that not only arrives in huge quantities under harsh computational and space constraints, but that can also change unexpectedly.
What We Do
In our lab, we conduct research in all these challenging areas, which are often intertwined instead of being separate. For example Web Mining often involves data of different types: Web Usage data as found in Web logs that record user navigation or clicks on a website, Structure data as in the hyperlinks between Web pages, and Text data as in the content of Web pages. Text Mining is therefore a special subset of Web Mining.
Also, contrary to most assumptions, Web data on most busy websites is highly dynamic. In particular, Web usage data possesses all the challenging characteristics of massive and evolving data streams, with one added challenge: it is of much higher dimensionality and is very sparse!!!