Video Captioning for the Visually Impaired

Xu, F.

Video Captioning for the Visually Impaired

Master thesis (2024)

Authors

F. Xu Electrical Engineering, Mathematics and Computer Science

Contributors

Julián Urbano (mentor)

O.E. Scharenborg (graduation committee member)

J.C. van Gemert Pattern Recognition and Bioinformatics - (graduation committee member)

Benjamin Timmermans IBM Center for Advanced Studies Benelux (mentor)

Roger Zhe Li (mentor)

Faculty

Electrical Engineering, Mathematics and Computer Science

Deep Learning Video Captioning The Visually Impaired

To reference this document use:

http://resolver.tudelft.nl/uuid:a68f0589-cc42-4718-928f-7ddb0f3f8043

More Info

expand_more

Published Date

11-09-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Visual impairment affects over 2.2 billion individuals globally, emphasizing the critical need for effective assistive technologies. This work focuses on developing a video captioning model explicitly tailored for visually impaired users, leveraging advancements in deep learning techniques. Video captioning involves converting video frames into textual descriptions, effectively bridging the domains of computer vision (CV) and natural language processing (NLP). We surveyed young visually impaired individuals from the Visio organization, who provided key insights into the design of our model.
We enhance the existing S2VT model by modifying its temporal attention mechanism to improve the recognition of visual surroundings, addressing the unique challenges visually impaired individuals face.
This research explores critical questions surrounding the model's sensitivity to actions, the readability of generated captions, and methods for latency reduction. To evaluate the model's effectiveness, we implement readability metrics—an approach not previously utilized in video captioning assessments. Our findings contribute to enhancing accessibility and independence for visually impaired individuals through advanced video captioning solutions.

Files

Updated_thesis_Fenglu.pdf

(pdf | 16.2 Mb)

Unknown license