MoReSo

Li, S.; Profeta, D.; Dauwels, J.H.G.

MoReSo

A DNN Framework Expediting Content-based Video Image Retrieval (CBVIR)

Conference paper (2024)

Authors

S. Li Signal Processing Systems

D. Profeta Signal Processing Systems

J.H.G. Dauwels Signal Processing Systems

Research Group

Signal Processing Systems

Content-Based Video Image Retrieval Content-Based Image Retrieval Key Frame Extraction Image Retrieval from Video

To reference this document use:

http://resolver.tudelft.nl/uuid:d4e3d779-e6e6-45de-beb3-6252f37f8acb

More Info

expand_more

Published Date

2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

Signal Processing Systems

Abstract

With the exponential growth of video data, individuals, particularly scholars in the fields of history and sociology, are increasingly reliant on video materials. However, the task of locating specific frames within videos remains a laborious and time-consuming endeavor. Advanced machine learning-assisted video processing techniques have emerged, including text-based video searches, video summarization, real-time object detection, and person re-identification. However, distinct from these, the main challenge of retrieving video frames based on given visual content is how to efficiently and accurately pinpoint the instance occurrences. To expedite the process while maintaining retrieval performance, we propose a two-stage approach, combining KeyFrame Extraction (KFE) and Content-based Image Retrieval (CBIR), underpinned a DNN-empowered framework called MoReSo. Our innovations include 1) the integration of improved statistical features with dynamic clustering in the KFE stage and 2) the development of the MoReSo framework, which consists of MobileNet and ResNet backbones with SOA layer to jointly represent video frames, achieving 2.67x increase in efficiency compared to existing solutions. Our framework is evaluated on two datasets: the annotated EHM Historical Database provided by digital history researchers and the widely-used image retrieval benchmark datasets, the Oxford and Paris datasets. The experimental results showcase that the proposed framework and scheme excel among other models in the CBVIR task. We make our code available for further exploration through our GitHub repository. This repository contains the implementation of our model and CBVIR system with a GUI prototype.

Files

MoReSo_A_DNN_Framework_Expedit... (pdf)

(pdf | 15.7 Mb)

Unknown license

File under embargo until 28-04-2025