This thesis investigates the stability and robustness of the Lyapunov Actor-Critic (LAC) algorithm in comparison to the widely-used Soft Actor-Critic (SAC) algorithm. Motivated by the need for reliable and robust control systems capable of operating in dynamic and unpredictable e
...
This thesis investigates the stability and robustness of the Lyapunov Actor-Critic (LAC) algorithm in comparison to the widely-used Soft Actor-Critic (SAC) algorithm. Motivated by the need for reliable and robust control systems capable of operating in dynamic and unpredictable environments, the research initially aimed to explore LAC's performance in an effort-controlled robotic manipulation task. However, the focus shifted to a reproduction study of Han et al. 2020 after identifying key discrepancies in the implementation details, hyperparameters, and undocumented aspects of their study.
The primary goal was to validate the reproducibility of Han et al. 2020's results and assess the role of critical parameters, such as the α₃ stability constraint, in the performance and stability of the LAC algorithm across various simulated environments. By reimplementing and rigorously testing the algorithms, our study confirmed that the α₃ parameter significantly impacts the performance and stability of the LAC algorithm, particularly in complex environments where maintaining high stability is essential for successful operation. The results demonstrated that a more conservative α₃ value, which tightens the stability constraint, improves performance and stability but also increases the time needed for the agent to adapt and stabilize its policy, highlighting the need to balance stability and adaptability in scenarios demanding both high performance and strict stability. Despite some small differences, the study confirmed the reproducibility of Han et al. 2020's findings, demonstrating the potential benefits of the LAC algorithm compared to SAC in terms of stability and robustness. Furthermore, the study showed that other hyperparameters, such as network architecture, training length, and horizon length, greatly influence the performance and stability of the LAC and SAC algorithms, underscoring the importance of precise hyperparameter tuning, transparent reporting, and consistent experimental configurations in reproducing and validating results in reinforcement learning research.
By validating the stability of the LAC algorithm across different simulated environments, and through the development of detailed guidelines for key hyperparameters and a robust codebase, this research provides a solid foundation for extending the algorithm's application to more complex tasks, such as effort-controlled environments. These contributions facilitate reproducibility and pave the way for future research into the LAC algorithm's ability to operate in dynamic and unpredictable environments, ultimately supporting the safe deployment of learning-based controllers in real-world systems where stability and reliability are paramount.