Deep reinforcement learning presents a compelling approach for the exploration of cluttered 3D environments, offering a balance between fast computation and effective vision-based navigation. Yet, the use of 3D navigation for learning-based information gathering remains largely u
...
Deep reinforcement learning presents a compelling approach for the exploration of cluttered 3D environments, offering a balance between fast computation and effective vision-based navigation. Yet, the use of 3D navigation for learning-based information gathering remains largely unexplored. Navigation in 3D space poses the challenge of having an increased state space but also provides possibilities due to the agent's increased mobility, so it is an interesting direction for research. Furthermore, current approaches to target mapping with 3D navigation do not consider cluttered environments, failing to address obstacle avoidance and occlusion handling.
This research introduces a novel deep reinforcement learning policy for vision-based information gathering with quadcopters, enabling efficient exploration of cluttered 3D environments. The core challenge is learning a time-efficient and collision-free exploration strategy, for which the policy design and the training procedure have been successfully developed. We formulate a target-searching task, where the goal is to reduce the agent's uncertainty about the target state. To achieve this goal, our method combines vision-based reasoning by deep reinforcement learning and probabilistic target mapping with an information-theoretic rewarding scheme to obtain a policy that makes informed exploration decisions.
Experiments comparing our method with a privileged greedy baseline show that in all tested environments, our policy achieves a significant outperformance. The results from our ablation study further validate our policy design, as every ablation performed results in worse exploration performance. Generally, our policy shows intelligent behaviour by effectively navigating through rooms and around obstacles. However, improvements can still be made, since failure cases where the agent gets stuck, can sporadically occur. Still, overall, the findings prove the feasibility of learning a 3D navigation policy for effective target mapping with quadcopters.