Search computing meets data extraction

More Info
expand_more

Abstract

Thanks to the Web, access to an increasing wealth and variety of information has become near instantaneous. To make informed decisions, however, we often need to access data from many different sources and integrate different types of information. Manually collecting data from scores of web sites and combining that data remains a daunting task. The ERC projects SeCo (Search Computing) and DIA- DEM (Domain-centric Intelligent Automated Data Extrac- Tion Methodology) address two aspects of this problem: SeCo supports complex search processes drawing on data from multiple domains with a user interface capable of refining and exploring the search results. DIADEM aims to automatically extract structured data from a domain's websites. In this paper, we outline a first approach for integrating SeCo and DIADEM. We discuss how to use the DIADEM methodology to automatically turn nearly any website from a given domain into a SeCo search service. We describe how such services can be registered and exploited by the SeCo framework in combination with services from other domains (and possibly developed with other methodologies).