Deep Reinforcement Learning (DRL) enables us to design controllers for complex tasks with a deep learning approach. It allows us to design controllers that are otherwise cumbersome to design with conventional control methodologies. Often, an objective for RL is binary in nature.
...
Deep Reinforcement Learning (DRL) enables us to design controllers for complex tasks with a deep learning approach. It allows us to design controllers that are otherwise cumbersome to design with conventional control methodologies. Often, an objective for RL is binary in nature. However, exploring in environments with sparse rewards is a problem in RL, and finding positive reward becomes exponentially more difficult with increased environment complexity. For this project, our objective is to design an RL based controller for the landing of a quadcopter on inclined surfaces. Landing is defined as reaching these inclined surfaces with reasonable speed, such that no damage is done to either the quadcopter or the surface to land on upon impact. We aim to use a binary reward for this task. We use methods to aid exploration in sparse reward environments, namely Hindsight Experience Replay (HER), and non-optimized demonstrations. HER can resample goals from the demonstrator data and the policy rollouts. The resampling of goals is done by considering a portion of the visited states during policy rollouts as the intended goals. The demonstrations are non-optimized in the sense that the demonstrations do not follow the same objective as ours. We consider demonstrations valid if these demonstrations are obtained from arbitrary stable policies. Our results show that the RL system does generalize to other goals when using HER and demonstrations. The demonstrations are not imitated as were to happen in pure imitation learning. HER, on the other hand, enabled us to receive reward in our complex environment, while also allowing us to experience multiple goals in one policy rollout. We found that lack of HER and demonstrations were not able to overcome the problems of exploration in sparse reward environments. We found that landing a quadcopter on inclined surfaces using an RL controller is feasible. Our trajectories clearly showed a swinging motion which in theory should be a valid control strategy for this problem. This swinging motion results in dead spots with the quadcopter being in a state with a minimal translational and rotational velocities under a relatively large angle. Further research is needed to increase the accuracy and robustness of our RL based controller.