The second category of algorithms update the parameters to obtain the optimal strategy by continuously calculating the gradient of the expected total return of the strategy, which is more widely used. <br>The Deep Deterministic Policy Gradient (DDPG) algorithm proposed by Lillicrap et al. [9] is a typical deep policy gradient algorithm based on the Actor-Critic [10] architecture. <br>It draws on the idea of DQN experience playback and separation of target networks, and successfully extends the discrete behavior space to the continuous behavior space. <br>In addition, in practical applications,
正在翻译中..