This thesis presents an approach to monocular depth estimation for Unmanned Aerial Vehicles (UAVs). Monocular depth estimation is a critical perception task for UAVs, enabling them to infer depth information from visual data without relying on heavy or power-consuming sensors suc
...
This thesis presents an approach to monocular depth estimation for Unmanned Aerial Vehicles (UAVs). Monocular depth estimation is a critical perception task for UAVs, enabling them to infer depth information from visual data without relying on heavy or power-consuming sensors such as LiDAR or stereo cameras. Given the operational constraints of UAVs, such as limited payload and energy resources, robust and efficient depth estimation methods are required to facilitate safe navigation and environmental interaction. The proposed methodology in this thesis integrates visual data from a monocular camera with inertial measurements from an Inertial Measurement Unit (IMU) sensor. This combination aims to address challenges such as scale ambiguity in the depth estimates and inaccuracies in dynamic environments that are common in aerial operations. The integration of IMU data with a differentiable camera-centric Extended Kalman Filter (EKF) allows for better ego-motion estimation, effectively calibrating the visual information with drone dynamics. The method further incorporates depth map frame prediction, leveraging initial depth estimates along with temporal dynamics to predict future depth maps. This predictive capability improves efficiency by reducing the need for full depth estimation in every frame, allowing robotic agents to anticipate environmental changes. The evaluation on simulated and real-world datasets shows that while the algorithm performs well over short forecast horizons, accumulating errors from IMU data and the assumption of a static environment limit its long-term accuracy. The future depth map prediction algorithm reduced the need for DynaDepth from 10 runs per second to 2, and on the Mid-Air dataset, from 25 to 5. Additionally, this study provides a foundation for future work, including the integration of an object-oriented frame prediction algorithm.