From Supervised to Reinforcement Learning: an Inverse Optimization Approach
More Info
expand_more
expand_more
Abstract
We propose a novel method combining elements of supervised- and Q-learning for the control of dynamical systems subject to unknown disturbances. By using the Inverse Optimization framework and in-hindsight information we can derive a causal parametric optimization policy that approximates a non-causal MPC expert. Furthermore, we propose a new min-max MPC scheme that robustifies against a ball around a disturbance trajectory. This scheme yields an exact convex reformulation using the S-Lemma, and is also approximated using Inverse Optimization. Finally, simulation studies clarify and verify our approach.