Interactive Reinforcement Learning for Adaptive Thermal Comfort

More Info
expand_more

Abstract

Designing and implementing effective systems for thermal comfort management in buildings is a complex task due to the need to account for subjective preference parameters influenced by human physiology, bias and tendencies. This research introduces a novel approach to simulating human interactions for managing thermal comfort. Stochastic simulated humans provide feedback in the form of thermostat interactions, from which their thermal comfort is inferred converting these interactions into rewards, called human rewards. Control policies are obtained from training with Human reward or PMV reward by utilizing the Proximal Policy Optimization (PPO) algorithm. It is shown that the learning process can be guided solely by human rewards. Experiment results assess the impact of this simulated human reward system on the adaptability of the reinforcement learning model for single human scenarios, also comparing back to the PMV reward case as ground truth. The policy trained with PMV reward achieves thermal control that keeps the PMV values inside the [-0.2,0.2] range, while the policy trained with the human reward achieves a range of [-0.6,0.6]. Simulating human feedback as an interaction with the thermostat, the proposed model is shown to capture a rough estimate of human thermal preference. This research paves the way for using simulated humans for interactive reinforcement learning (RL) based thermal comfort control.

Files

License info not available