Interactive Data Discovery in Data Lakes

More Info
expand_more

Abstract

As data is produced at an unprecedented rate, the need and ex- pectation to make it easily available for the end-users is growing. Dataset Discovery has become an important subject in the data management community, as it represents the means of providing the data to the user and fulfilling an information need. Since the end-user is the one that needs the information and knows what type of information to look for, little has been done to involve the user in the discovery process.

This PhD project addresses the topic of interactive data discovery, where the user’s interests are modelled through interactions and used as a context for the discovery process. We aim to develop a system that addresses the problem of minimising the trade-off between efficiency and effectiveness, thus providing accurate re- sults in an interactive fashion. The innovative part of the system consists of extracting the user’s interests and data needs through interactions and using them to enrich the data context and provide tailored results to the user. We describe the steps to create models and methods that would be used in designing the prototype and we relate to previous systems and neighbouring communities for optimising the system.