Exploring reinforcement learning methods for autonomous sequencing and spacing of aircraft
Research on reinforcement learning algorithms to play complex video games have brought forth controllers surpassing human performance. This paper explores the possibilities of applying these techniques to the sequencing and spacing of aircraft. Two experiments are performed. First a single aircraft must learn to fly a 4D trajectory using only heading commands. To train an agent Duelling Deep Q-Networks has been applied to learn a successful policy, however, learning is unstable and does not provide a suitable basis for extending this to a multi-agent setting. Second, a multi-agent experiment is performed where aircraft have to sequence and space themselves for landing without a 4D constraint. A Bidirectional Communication Net has been trained using Deep Deterministic Policy Gradients first on a single traffic scenario and then on multiple traffic scenarios. Emerging strategies have been seen in the single scenario training e.g. a holding, but no optimal policy was found. Training on multiple traffic scenarios showed no coordination efforts between the aircraft. Further analysis showed the importance of a proper reward function and exploration strategies which were likely the cause of not finding an optimal policy for a multi-agent setting.