MM

M.D.I. Museur

1 records found

Recent work has shown that offline reinforcement learning (RL) does not generalize well to new environments compared to behavioral cloning (BC). We propose WSAC-N, an ensemble model of soft actor-critics with weights to de-emphasize actions with high variance. We compare the zero ...