Offline Reinforcement Learning (Offline RL) involves learning policies from a static dataset without further interactions with the environment, making it suitable for high-stakes scenarios where data collection is costly or risky. This paper investigates the generalization capabi
...
Offline Reinforcement Learning (Offline RL) involves learning policies from a static dataset without further interactions with the environment, making it suitable for high-stakes scenarios where data collection is costly or risky. This paper investigates the generalization capabilities of Implicit Q-Learning (IQL), an offline RL algorithm, compared to Behavioral Cloning (BC). We adapt the IQL algorithm for discrete control and evaluate both IQL and BC in a four-room environment using training datasets generated from different behavioral policies. Performance is assessed based on average rewards over various test seeds, on reachable and unreachable tasks, as well as the training set. Our results indicate that BC consistently outperforms IQL across all scenarios, although IQL reaches peak performance faster. This study highlights the need for further research into offline RL algorithms for better generalization and more robust performance in diverse environments. Full code available on GitHub.