Describing Data Processing Pipelines in Scientific Publications for Big Data Injection

Mesbah, Sepideh; Bozzon, A.; Lofi, Christoph; Houben, G.J.P.M.

Describing Data Processing Pipelines in Scientific Publications for Big Data Injection

Conference paper (2017)

Authors

Sepideh Mesbah Web Information Systems -

A. Bozzon Web Information Systems -

Christoph Lofi Web Information Systems -

G.J.P.M. Houben Web Information Systems -

Research Group

Web Information Systems () (TU Delft)

Ontology Document structure Digital Libraries and archives

To reference this document use:

http://resolver.tudelft.nl/uuid:71c4c6f4-a5b5-4b28-a4b5-fe59b8adb674

More Info

expand_more

Published Date

2017

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software Technology

Research Group

Web Information Systems

Abstract

The rise of Big Data analytics has been a disruptive game changer for many application domains, allowing the integration into domain-specific applications and systems of insights and knowledge extracted from external big data sets. The effective ``injection'' of external Big Data demands an understanding of the properties of available data sets, and expertise on the available and most suitable methods for data collection, enrichment and analysis. A prominent knowledge source is scientific literature, where data processing pipelines are described, discussed, and evaluated. Such knowledge is however not readily accessible, due to its distributed and unstructured nature. In this paper, we propose a novel ontology aimed at modeling properties of data processing pipelines, and their related artifacts, as described in scientific publications. The ontology is the result of a requirement analysis that involved experts from both academia and industry. We showcase the effectiveness of our ontology by manually applying it to a collection of publications describing data processing methods.

Files

3057148.3057149.pdf

(pdf | 3.23 Mb)

Unknown license

Download not available