Policy Analysis of Safe Vertical Manoeuvring using Reinforcement Learning: Identifying when to Act and when to stay Idle

More Info
expand_more

Abstract

The number of unmanned aircraft operating in the airspace is expected to grow exponentially during the next decades. This will likely lead to traffic densities that are higher than those currently observed in civil and general aviation, and might require both a different airspace structure compared to conventional aviation, as well as different conflict resolution methods. One of the main disadvantages of analytical conflict resolution methods, in high-traffic density scenarios, is that they can cause instabilities of the airspace due to a domino effect of secondary conflicts. Therefore, many studies have also investigated other methods of conflict resolution, such as Deep Reinforcement Learning, which have shown positive results, but tend to be hard to explain due to their black-box nature. This paper investigates if it is possible to explain the behaviour of a Soft Actor-Critic model, trained for resolving vertical conflicts in a layered urban airspace, by interpreting the policy through a heat map of the selected actions. It was found that the model actively changes its policy depending on the degrees of freedom and has a tendency to adopt preventive behaviour on top of conflict resolution. This behaviour can be directly linked to a decrease in secondary conflicts when compared to analytical methods and can potentially be incorporated into these methods to improve them while maintaining explainability.

Files