Reinforcement learning, especially deep reinforcement learning, has made many advances in the last decade. Similarly, great strides have been made in multi-agent reinforcement learning. Systems of cooperative autonomous robots are increasingly being used, for which multi-agent re
...
Reinforcement learning, especially deep reinforcement learning, has made many advances in the last decade. Similarly, great strides have been made in multi-agent reinforcement learning. Systems of cooperative autonomous robots are increasingly being used, for which multi-agent reinforcement learning can be used as a training method. However, the curse of dimensionality remains a problem for the computational speed of the learning algorithm and the bandwidth of communication channels. This research will focus mainly on reducing the problem of overloading communication channels by trying to reduce the number of communications. This is possible since it is usually unnecessary for every agent to communicate with every other agent constantly.
To do this, we use some ideas by Daniel Jarne Ornia. The first is to reduce communications in a multi-agent reinforcement learning system by treating it as an event-triggered control problem. This method uses so-called robustness surrogates as an equivalent to a Lyapunov function to determine if a communication can be skipped without decreasing the performance more than some tolerance. The second is a method to increase the observational robustness of a policy by using lexicographic reinforcement learning.
We aim to combine these ideas and trade the additional observational robustness for decreased communications. We also want to test whether additional observational robustness can help mitigate the sim-to-real gap. We implement this method for the multi-agent deep deterministic policy gradient algorithm and perform tests on a variant of the predator-prey domain in increasingly more realistic simulations.
We found that the combination of this robust policy and the robustness surrogates method does enable the agents to achieve the same return while communicating less. Unfortunately, our research shows that the observational robustness obtained using lexicographic reinforcement learning, does not help mitigate the sim-to-real gap.