Coner

Vliegenthart, Daniel; Mesbah, S.; Lofi, Christoph; Aizawa, Akiko; Bozzon, A

doi:10.1007/978-3-030-30760-8_1

Coner

A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications

Conference paper (2019)

Authors

Daniel Vliegenthart National Institute of Informatics, Student

S. Mesbah Web Information Systems

Christoph Lofi Web Information Systems

Akiko Aizawa National Institute of Informatics

A Bozzon Web Information Systems

Research Group

Web Information Systems

DOI: https://doi.org/10.1007/978-3-030-30760-8_1

To reference this document use:

http://resolver.tudelft.nl/uuid:e3d634e4-6ba9-4ab4-b80a-ffaefc527091

More Info

expand_more

Published Date

2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

Web Information Systems

Abstract

Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.

Files

2019TPDL_Coner.pdf

(pdf | 0.455 Mb)

Unknown license

Vliegenthart2019_Chapter_Coner... (pdf)

(pdf | 1.12 Mb)

Unknown license

Download not available