ncentive-based demand response (iDR) programs serve as important tools for distributed system operators (DSOs) to achieve a reduction in electricity demand during periods of grid overload. During these programs, participants can decide to curtail their consumption in exchange for
...
ncentive-based demand response (iDR) programs serve as important tools for distributed system operators (DSOs) to achieve a reduction in electricity demand during periods of grid overload. During these programs, participants can decide to curtail their consumption in exchange for financial incentives. Deciding the amount of curtailment for a participant is often the result of individual preferences. Reinforcement Learning (RL) methods have been employed to automate participants’ decision-making in these programs, often relying on predefined reward designs based on observed behavioral patterns. This thesis introduced PbRL-iDR: a reinforcement learning approach that can learn a reward function unique to individual participants by querying them for preference labels on a set of trajectories. PbRL-iDR trains the reward model and the policy on an alternating cycle. First, queries are sent to the simulated participant to update the current reward model. Later, the updated reward model is used to improve the policy. Variations of the PbRL-iDR algorithm are proposed to optimize query efficiency: active query selection (AQS) and parameter transfer from model ensemble (PTME). Through experimentation, PbRL-iDR demonstrated comparable performance to a DQN-based method, albeit with a slower convergence. An ablation study was performed to test the efficacy of AQS and PTME in reducing the number of queries necessary to learn a reward function. Results suggest that AQS can help the policy converge sooner and after fewer queries when compared to PbRL-iDR without AQS. The same experiment showed that using PTME failed to yield similar improvements.