CQ

Christoph Quix

15 records found

Data Lakes

A Survey of Functions and Systems

Data lakes are becoming increasingly prevalent for Big Data management and data analytics. In contrast to traditional 'schema-on-write' approaches such as data warehouses, data lakes are repositories storing raw data in its original formats and providing a common access interface ...
This chapter introduces the most important features of data lake systems, and from there it outlines an architecture for these systems. The vision for a data lake system is based on a generic and extensible architecture with a unified data model, facilitating the ingestion, stora ...
Schema mappings express the relationships between sources in data interoperability scenarios and can be expressed in various formalisms. Source-to-target tuple-generating dependencies (s-t tgds) can be easily used for data transformation or query rewriting tasks. Second-order tgd ...
Functional dependencies are important for the definition of constraints and relationships that have to be satisfied by every database instance. Relaxed functional dependencies (RFDs) can be used for data exploration and profiling in datasets with lower data quality. In this work, ...
In this work, we present a Metadata Framework in the direction of extending intelligence mechanisms from the Cloud to the Edge. To this end, we build on our previously introduced notion of Data Lagoons—the analogous to Data Lakes at the network edge—and we introduce a novel archi ...
New levels of cross-domain collaboration between manufacturing companies throughout the supply chain are anticipated to bring benefits to both suppliers and consumers of products. Enabling a fine-grained sharing and analysis of data among different stakeholders in an automated ma ...
JSON has become one of the most popular data formats. Yet studies on JSON data integration (DI) are scarce. In this work, we study one of the key DI tasks, nested mapping generation in the context of integrating heterogeneous JSON based data sources. We propose a novel mapping re ...
The increasing popularity of NoSQL systems has lead to the model of polyglot persistence, in which several data management systems with different data models are used. Data lakes realize the polyglot persistence model by collecting data from various sources, by storing the data i ...
Interdisciplinary research and development projects in medical engineering bene_t from well selected collaboration partners. The process of _nding such partners from often unfamiliar _elds is di_cult, but can be supported by an expert pro_le that is based on patent analysis and c ...
Medical engineering (ME) is an interdisciplinary domain with short innovation cycles. Usually, researchers from several fields cooperate in ME research projects. To support the identification of suitable partners for a project, we present an integrated approach for patent classif ...
The heterogeneity of sources in Big Data systems requires new integration approaches which can handle the large volume of the data as well as its variety. Data lakes have been proposed to reduce the upfront integration costs and to provide more _exibility in integrating and analy ...
As the challenge of our time, Big Data still has many research hassles, especially the variety of data. The high diversity of data sources often results in information silos, a collection of non-integrated data management systems with heterogeneous schemas, query languages, and A ...
Successful research and development projects start with finding the right partners for the venture. Especially for interdisciplinary projects, this is a difficult and tedious task as experts from foreign domains are not known. Fur thermore, the transfer of knowledge from research ...
In addition to volume and velocity, Big data is also characterized by its variety. Variety in structure and semantics requires new integration approaches which can resolve the integration challenges also for large volumes of data. Data lakes should reduce the upfront integration ...
Successful research and development projects start with finding the right partners for the venture. Especially for interdisciplinary projects, this is a difficult task as experts from foreign domains are not known. Furthermore, the transfer of knowledge from research into practic ...