ST-Sem

Sharifi Noorian, S.; Psyllidis, A.; Bozzon, A.

doi:10.1007/978-3-030-19274-7_3

ST-Sem

A Multimodal Method for Points-of-Interest Classification Using Street-Level Imagery

Conference paper (2019)

Authors

S. Sharifi Noorian Web Information Systems -

A. Psyllidis Web Information Systems -

A. Bozzon Web Information Systems -

Research Group

Web Information Systems () (TU Delft)

DOI: https://doi.org/10.1007/978-3-030-19274-7_3

Convolutional neural networks Points of Interest Street-level imagery Semantic similarity Word embeddings

To reference this document use:

http://resolver.tudelft.nl/uuid:a651873c-5aed-4824-8281-308d98f3f9d1

More Info

expand_more

Published Date

26-04-2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software Technology

Research Group

Web Information Systems

Abstract

Street-level imagery contains a variety of visual information about the facades of Points of Interest (POIs). In addition to general mor- phological features, signs on the facades of, primarily, business-related POIs could be a valuable source of information about the type and iden- tity of a POI. Recent advancements in computer vision could leverage visual information from street-level imagery, and contribute to the classification of POIs. However, there is currently a gap in existing literature regarding the use of visual labels contained in street-level imagery, where their value as indicators of POI categories is assessed. This paper presents Scene-Text Semantics (ST-Sem), a novel method that leverages visual la- bels (e.g., texts, logos) from street-level imagery as complementary in- formation for the categorization of business-related POIs. Contrary to existing methods that fuse visual and textual information at a feature- level, we propose a late fusion approach that combines visual and textual cues after resolving issues of incorrect digitization and semantic ambiguity of the retrieved textual components. Experiments on two existing and a newly-created datasets show that ST-Sem can outperform visual-only approaches by 80% and related multimodal approaches by 4%.

Files

ST_Sem_A_Multi_modal_Method_fo... (pdf)

(pdf | 1.7 Mb)

Unknown license