We conduct research to
advance state of the art in the area of Knowledge Discovery in Data
sets (KDD), with an emphasis on Data Mining, and in particular Web
Mining and Stream Data Mining. Our daily activities consist of
learning, investigation, design, implementation, testing, and
evaluation of efficient algorithms and techniques to solve challenging
problems in support of a variety of applications, such as
-
Web
analytics & Web personalization for e-commerce and information
retrieval,
- Mining evolving data streams with an emphasis on
evolving Web clickstreams
-
Scalable
and/or personalized information retrieval in data such as text and
astronomical data sets.
Recently, the world has
witnessed an explosion of electronically stored data. Most
organizations rely on huge databases that contain a wealth of
information which unfortunately, is not fully exploited. In fact the
increasing size of most data repositories is making the access to
useful information more and more difficult. Hence, the saying that "The
mechanical production of data has created the need for a mechanical
consumption of data" is not in vain. Data mining (DM)
comprises the set of intelligent tools that can be used to extract
useful or interesting information, such as patterns, associations,
change, anomalies and significant structures, from large amounts of
data stored in various information repositories.
Data
mining inherits a legacy from diverse disciplines, including:
- Machine
Learning & Artificial Intelligence
- Pattern
Recognition
- Statistics
- Database
Systems
- Information
Retrieval
Recently,
several particularly challenging areas of research in data mining have
emerged, including:
- Web
Mining: mining web data (semi-structured to
unstructured)
- Text
Mining: mining text data such as in Web pages and
e-mails
- Stream
Data Mining: mining data that arrives in huge
quantities under extremely stringent memory constraints, making it
necessary to process the data in only one sequential direction (ex:
clickstream data)
- Mining
Evolving Data Streams: mining
data that not only arrives in huge quantities under harsh computational
and space constraints, but that can also change unexpectedly
What we
do
In
our lab, we conduct research in all these challenging areas, which are
often intertwined instead of being separate. For example Web Mining
often involves data of different types: Web Usage data
as found in Web logs that record user navigation or clicks on
a website, Structure data as in the hyperlinks
between Web pages, and Text data as in the content
of Web pages. Text Mining is therefore a special subset of Web Mining.
Also,
contrary to most assumptions, Web data on most busy websites is highly
dynamic. In particular, Web usage data possesses all the challenging
characteristics of massive and evolving data streams,
with one added challenge: it is of much higher dimensionality and is
very sparse!!!