Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

More Info
expand_more

Abstract

This thesis addresses the Learning-Based Control (LBC) of unknown partially observable systems in the Linear Quadratic (LQ) paradigm. In this setting of learning-based LQ control, the control action influences not only the control performance but also the rate at which the system is being learnt, causing a conflict between learning and control (exploration and exploitation), which is particularly challenging to address. This thesis aims to develop a novel LBC algorithm for unknown partially observable systems in the LQG setting that is computationally efficient and can guarantee an optimal exploration-exploitation trade-off, quantified by a metric called regret. The regret quantifies the cumulative performance gap between the LBC policy and the ideal controller having full knowledge of the true system dynamics. The contributions in this thesis involve a novel LBC algorithm deployed in a two-phase structure. The first phase involves injecting Gaussian input signals to obtain an initial system model. The subsequent second phase deploys the proposed LBC strategy in an episodic setting, where the model is updated for each episode, and the resulting updated LQG controller is applied with additive Gaussian signals for exploration. In addition, the thesis establishes strong theoretical guarantees on optimal regret growth.