The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

by Chao Yu, …, Eugene Vinitsky, et. al

Paper doesn't propose new algorithm, but provides empirical evidence on usefullness of PPO in MARL
PPO is overlooked in MARL but people think it would be sample inefficient
But authors found that it was sample efficient as wells as produced good results
They use usual modifications to PPO like using GAE (Generalized Advantage Estimation) and use modification specific to MARL like Death Masking (pg. 14)
As limitation: the experiments were only done on envrionments with [Page 10]
- discrete action space
- collaborative problems
- homogeneous agents