Abstraction-Guided Modular Reinforcement Learning

More Info
expand_more

Abstract

Reinforcement learning (RL) models the learning process of humans, but as exciting advances are made that use increasingly deep neural networks, some of the fundamental strengths of human learning are still underutilized by RL agents. One of the most exciting properties of RL is that it appears to be incredibly flexible, requiring no model or knowledge of the task to be solved. However, this thesis argues that RL is inherently inflexible for two main reasons: 1. If there is existing knowledge, incorporating this without compromising the optimality of the solution is highly non-trivial, and 2. RL solutions can not be easily transferred between tasks, and generally require complete retraining to guarantee that a solution will work in a new task.
Humans, on the other hand, are very flexible learners. We easily transfer knowledge from one task to another, and can learn from knowledge that we learned in other tasks or that other people share with us. Humans are exceptionally good at abstraction, or developing conceptual understandings that allow us to extend knowledge to never-before seen experiences. No artificial agent nor neural network has displayed the abstraction and generalization capabilities of humans in such varied tasks and environments. Despite this, utilizing the human as a tool for abstraction is commonly done only at the stage of defining the model. In general, this means making choices about what to include in the state space that will make the problem solvable without adding unnecessary complexity. While necessary, this step is not explicitly referred to as abstraction, and it is generally not considered relevant to how RL is applied. Much of the research in RL is less focused on how the problem is modelled, and instead centers the development and application of computational advances that allow for solving bigger and bigger problems.
Applying abstraction explicitly is highly non-trivial, as confirming that an abstract problem preserves the necessary information of the true problem can generally only be done if a full solution is already found, which may defeat the purpose of finding an abstraction if such a solution cannot be found. When such a confirmation can be made, the abstraction can be the result of a very complex function that would be difficult for a human to define. In this work, human-defined abstractions are used in a way that goes beyond the initial definition of the problem.
The first approach, presented in Chapter 3, breaks a problem into several abstract problems, and uses the same experience to solve each at the same time. A meta-agent learns how to compose the learned policies together to find the optimal policy. In Chapter 4, a method is introduced that uses supervised learning to train a model on partially observable experience which is labelled with hindsight. The agent then learns a policy on predicted states, trading off information gathering with reward maximization. The last method presented in Chapter 5 is a modular approach to offline RL, where even with expert data, the method can become ineffective if the given data does not cover the entire problem space. This method introduces a second problem of recovering the agent to a state where it can safely follow the expert’s action. The method applies abstraction to multiply the given data and safely plan recovery policies. Combining the recovery policies with the imitation policy maintains high performance even when the expert data provided is limited.
In the methods developed in this research, a learning-to-learn component enables the agent to relax the usually strict requirements of abstraction, the parallel processing allows the agent to learn more from fewer samples, and the modularity means that the agent can transfer its knowledge to other related tasks.

Files

Dissertation.pdf
(pdf | 5.1 Mb)
Unknown license