Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

Master thesis (2023)

Authors

A. ATHREY Mechanical Engineering

Contributors

B.H.K. De Schutter Delft Center for Systems and Control (mentor)

S. Shi Team Bart De Schutter - Mechanical, Maritime and Materials Engineering (coach)

M. Khosravi Team Khosravi - Mechanical, Maritime and Materials Engineering (graduation committee member)

Othmane Mazhar KTH Royal Institute of Technology (graduation committee member)

Faculty

Mechanical Engineering, Mechanical Engineering

To reference this document use:

http://resolver.tudelft.nl/uuid:6c47623c-e3bf-48b1-b16b-a06155740df9

More Info

expand_more

Published Date

30-10-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

This thesis addresses the Learning-Based Control (LBC) of unknown partially observable systems in the Linear Quadratic (LQ) paradigm. In this setting of learning-based LQ control, the control action influences not only the control performance but also the rate at which the system is being learnt, causing a conflict between learning and control (exploration and exploitation), which is particularly challenging to address. This thesis aims to develop a novel LBC algorithm for unknown partially observable systems in the LQG setting that is computationally efficient and can guarantee an optimal exploration-exploitation trade-off, quantified by a metric called regret. The regret quantifies the cumulative performance gap between the LBC policy and the ideal controller having full knowledge of the true system dynamics. The contributions in this thesis involve a novel LBC algorithm deployed in a two-phase structure. The first phase involves injecting Gaussian input signals to obtain an initial system model. The subsequent second phase deploys the proposed LBC strategy in an episodic setting, where the model is updated for each episode, and the resulting updated LQG controller is applied with additive Gaussian signals for exploration. In addition, the thesis establishes strong theoretical guarantees on optimal regret growth.

Files

Msc_thesis_report_Archith_Athr... (pdf)

(pdf | 4.75 Mb)