Reinforcement Learning (RL) is rapidly becoming a mainstay research direction within Air Traffic Management and Control (ATM/ATC). Many international consortia and individual works have explored its applicability to different ATC and U-Space / Urban Aircraft System Traffic Manage
...
Reinforcement Learning (RL) is rapidly becoming a mainstay research direction within Air Traffic Management and Control (ATM/ATC). Many international consortia and individual works have explored its applicability to different ATC and U-Space / Urban Aircraft System Traffic Management (UTM) tasks, such as merging traffic flows, with varying levels of success. However, to date there is no common basis on which these RL techniques are compared, with many research parties building their own simulator and scenarios from scratch. This can diminish the value of this research, as the performance of an algorithm cannot be easily verified, or compared to that of other implementations. This hampers development in the long run. The gymnasium library shows for other research domains that this can be solved by providing a set of standardised environments, which can be used to test different algorithms, and compare them to benchmark results. This paper proposes BlueSky-Gym: a library that provides a similar set of test environments for the aviation domain, building on the existing open-source air traffic simulator BlueSky. The current BlueSky-Gym environments range from vertical descent environments, to static obstacle avoidance and traffic flow merging. Built upon the Gymnasium API and the BlueSky air traffic simulator, it delivers an open-source solution for the ATC-specific RL performance benchmark. In the initial release of BlueSky-Gym, 7 functional environments are presented. Preliminary experiments with PPO, SAC, DDPG and TD3 are presented in this paper. Results show stable training is obtained on all of the environments with the default hyperparameters. On some environments, there is a large performance gap, with the on-policy PPO often trailing, but overall no clear algorithm that outperforms others across the board in terms of total reward.@en