Exploring the Synergy between Inverse Reinforcement Learning and Reinforcement Learning From Human Feedback for Query Reduction
More Info
expand_more
Abstract
Reinforcement Learning is a powerful tool for problems that require sequential-decision-making. However, it often faces challenges due to the extensive need for reward engineering. Reinforcement Learning from Human Feedback (RLHF) and Inverse Reinforcement Learning (IRL) hold the promise of learning a reward function without manual encoding. While RLHF uses feedback to estimate a reward function, IRL learns from demonstrations, examples provided by a teacher. In practice, both approaches have their advantages and disadvantages. IRL typically learns faster, provided that demonstrations are correct and sufficiently diverse. However, obtaining optimal demonstrations is inherently hard, since a teacher may not cover all possibilities, and their examples might fail to demonstrate the behaviour intended. Interactive feedback is believed to be easier to provide than demonstrations. However, RLHF suffers from the curse of dimensionality and the learner’s random behavior at early learning trials. It also requires a large number of evaluative feedbacks, queries to a human labeler. We propose a learning framework in which these two approaches would potentially benefit from one another, with the purpose of investigating whether we can reduce the number of queries RLHF needs. Furthermore, we use Adversarial IRL (AIRL) and RLHF with preference comparisons. We examine our approach in two experimental studies. Our results indicate that combining AIRL with RLHF yields promising outcomes, but the effectiveness highly depends on the nature and number of demonstrations, and the specifics of the environment.