The rapid evolution of Unmanned Aerial Vehicles (UAVs) has revolutionized target search operations in various fields, including military applications, search and rescue missions, and post-disaster management. In this paper, we propose the use of a multi-armed bandit algorithm for
...
The rapid evolution of Unmanned Aerial Vehicles (UAVs) has revolutionized target search operations in various fields, including military applications, search and rescue missions, and post-disaster management. In this paper, we propose the use of a multi-armed bandit algorithm for a UAV's search mission in an unknown and adversarial setting. The UAV's objective is to locate a mobile target formation, assuming that their mobility resembles an adversarial behavior. To achieve this, we formulate an optimization problem and leverage the Exp3 (exponential-weighted exploration and exploitation) algorithm to solve it. The targets are assumed to be moving under the assumption of an unknown and potentially non-stationary probability distribution. To enhance the learning process, we integrate environmental observations as contextual information, resulting in a variant called C-Exp3, which optimizes the search process. Finally, we evaluate the performance of C-Exp3 in UAV search missions, focusing on adversarial environments. The primary objective for the UAV is to converge towards an optimal policy as time t approaches the horizon T, reflecting the UAV's capacity to learn the formation's strategy.
@en