Diffusion MVSNet: A Learning-based MVS Boosted by Diffusion-Based Image Enhancement Model

Master thesis (2024)

Authors

C. Zhang Architecture and the Built Environment

Contributors

N. Ibrahimli Urban Data Science - Architecture and the Built Environment (mentor)

L. Nan Urban Data Science - Architecture and the Built Environment (graduation committee member)

Faculty

Architecture and the Built Environment, Architecture and the Built Environment

To reference this document use:

http://resolver.tudelft.nl/uuid:02a605e4-ebb1-4f5d-b7c8-7ef32baa82b1

More Info

expand_more

Published Date

27-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Architecture and the Built Environment

Abstract

Multiview Stereo (MVS) reconstruction techniques have made significant advancements with the development of deep learning. However, their performance often deteriorates in low-light conditions, where feature extraction and matching become challenging. Traditional image enhancement solutions are insufficient for MVS tasks in low illumination, relying on manual adjustments. We introduce an end-to-end MVS framework incorporating a diffusion-based image enhancement algorithm with MVS to build an end-to-end framework for improving the performance of MVS in low-light conditions. This integration improves color rendering and visualization of 3D reconstructions and slightly enhances geometric shapes. Our method uses a feature adapter to integrate the enhanced images from the Low-light Diffusion model into CasMVSNet, refining the feature maps in poorly lit environments. Validation on the DTU and Tanks and Temples datasets demonstrates our model’s robustness and generalizability across various lighting conditions and MVS pipelines, including GeoMVSNet and MVSNet. Our approach simplifies the training process by requiring only the training of an adapter rather than a multi-view image enhancement model, underscoring the effectiveness of incorporating image enhancement into learning-based MVS frameworks for low-light conditions.

Files

Final_thesis.pdf

(pdf | 157 Mb)

P5.pdf

(pdf | 11.6 Mb)