An Online Learning Framework for UAV Target Search Missions in Non-Stationary Environments

Khial, Noor; Mhaisen, N.; Mabrok, Mohamed; Mohamed, Amr

An Online Learning Framework for UAV Target Search Missions in Non-Stationary Environments

Conference paper (2024)

Authors

Noor Khial Qatar University

N. Mhaisen Networked Systems

Mohamed Mabrok Qatar University

Amr Mohamed Qatar University

Research Group

Networked Systems

UAV Online Learning Multi-Armed Bandits Search Mission

To reference this document use:

http://resolver.tudelft.nl/uuid:7b296a9b-8a49-41ea-96f1-02b8e41e4cf7

More Info

expand_more

Published Date

2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

Networked Systems

Abstract

The rapid evolution of Unmanned Aerial Vehicles (UAVs) has revolutionized target search operations in various fields, including military applications, search and rescue missions, and post-disaster management. In this paper, we propose the use of a multi-armed bandit algorithm for a UAV's search mission in an unknown and adversarial setting. The UAV's objective is to locate a mobile target formation, assuming that their mobility resembles an adversarial behavior. To achieve this, we formulate an optimization problem and leverage the Exp3 (exponential-weighted exploration and exploitation) algorithm to solve it. The targets are assumed to be moving under the assumption of an unknown and potentially non-stationary probability distribution. To enhance the learning process, we integrate environmental observations as contextual information, resulting in a variant called C-Exp3, which optimizes the search process. Finally, we evaluate the performance of C-Exp3 in UAV search missions, focusing on adversarial environments. The primary objective for the UAV is to converge towards an optimal policy as time t approaches the horizon T, reflecting the UAV's capacity to learn the formation's strategy.

Files

An_Online_Learning_Framework_f... (pdf)

(pdf | 3.94 Mb)

- Embargo expired in 12-03-2025

License info not available