Shangtong Zhang

Conference paper (1)

Journal article (1)

2 records found

Deep residual reinforcement learning

Conference paper (2020) - Shangtong Zhang (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMi ...

Generalized off-policy actor-critic

Journal article (2019) - Shangtong Zhang (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting. Compared to the commonly used excursion objective, which can be misleading about the performance ...