The control of aircraft can be carried out by Reinforcement Learning agents; however, the difficulty of obtaining sufficient training samples often makes this approach infeasible. Demonstrations can be used to facilitate the learning process, yet algorithms such as Apprenticeship
...
The control of aircraft can be carried out by Reinforcement Learning agents; however, the difficulty of obtaining sufficient training samples often makes this approach infeasible. Demonstrations can be used to facilitate the learning process, yet algorithms such as Apprenticeship Learning generally fail to produce a policy that outperforms the demonstrator, and thus cannot efficiently generate policies. In this paper, a model-free learning algorithm with Reinforcement Learning in the loop, based on Apprenticeship Learning, is therefore proposed. This algorithm uses external measurement to improve on the initial demonstration, finally producing a policy that surpasses the demonstration. Efficiency is further improved by utilising the policies produced during the learning process. The empirical results for simulated quadrotor control show that the proposed algorithm is effective and can even learn good policies from a bad demonstration.