And we also found that the total reward value obtained by the system in one cycle (20 time slots) increases with the increase of the number of users, which is mainly related to the collaborative multi-agent reinforcement learning environment reward is set as the global reward for all users
正在翻译中..