Workshop on Human-in-the-loop Data Curation

Demartini, Gianluca; Yang, J.; Sadiq, Shazia

doi:10.1145/3511808.3557498

Workshop on Human-in-the-loop Data Curation

Conference paper (2022)

Authors

Gianluca Demartini University of Queensland

J. Yang Web Information Systems -

Shazia Sadiq University of Queensland

Research Group

Web Information Systems () (TU Delft)

DOI: https://doi.org/10.1145/3511808.3557498

To reference this document use:

http://resolver.tudelft.nl/uuid:fedae44d-5345-477a-8271-9194a28647e8

More Info

expand_more

Published Date

2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software Technology

Research Group

Web Information Systems

Abstract

Although data quality is a long-standing and enduring problem, it has recently received a resurgence of attention due to the fast proliferation of data analytics, machine learning, and decision-support applications built upon the wide-scale availability and accessibility of (big) data. The success of such applications heavily relies on not only the quantity, but also the quality of data. Data curation, which may include annotation, cleaning, transformation, integration, etc., is a critical step to provide adequate assurances on the quality of analytics and machine learning results. Such data preparation activities are recognised as time and resource intensive for data scientists as data often comes with a number of challenges that need to be tackled before it can be used in practice. Data re-purposing and the resulting distance between design and use intentions of the data, is a fundamental issue behind many of these challenges. These challenges include a variety of data issues such as noise and outliers, incompleteness, representativeness or biases, heterogeneity of format or semantics, etc. Mishandling these challenges can lead to negative and sometimes damaging effects, especially in critical domains like healthcare, transport, and finance. An observable distinct feature of data quality in these contexts is the increasingly important role played by humans, being often the source of data generation and the active players in data curation. This workshop will provide an opportunity to explore the interdisciplinary overlap between manual, automated, and hybrid human-machine methods of data curation.

Files

3511808.3557498.pdf

(pdf | 0.891 Mb)

Unknown license