Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms - PubMed (original) (raw)

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Haifei Zhang et al. Comput Intell Neurosci. 2022.

Abstract

The traditional Deep Deterministic Policy Gradient (DDPG) algorithm has been widely used in continuous action spaces, but it still suffers from the problems of easily falling into local optima and large error fluctuations. Aiming at these deficiencies, this paper proposes a dual-actor-dual-critic DDPG algorithm (DN-DDPG). First, on the basis of the original actor-critic network architecture of the algorithm, a critic network is added to assist the training, and the smallest Q value of the two critic networks is taken as the estimated value of the action in each update. Reduce the probability of local optimal phenomenon; then, introduce the idea of dual-actor network to alleviate the underestimation of value generated by dual-evaluator network, and select the action with the greatest value in the two-actor networks to update to stabilize the training of the algorithm process. Finally, the improved method is validated on four continuous action tasks provided by MuJoCo, and the results show that the improved method can reduce the fluctuation range of error and improve the cumulative return compared with the classical algorithm.

Copyright © 2022 Haifei Zhang et al.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest to report regarding the present study.

Figures

Figure 1

Figure 1

The structure of DDPG algorithm.

Figure 2

Figure 2

Architecture of dual-actors and dual-critics.

Figure 3

Figure 3

Arm_easy environment and task.

Figure 4

Figure 4

Mountain car continuous.

Figure 5

Figure 5

Half cheetah.

Figure 6

Figure 6

Comparative experiments of three algorithms in eight different continuous action tasks. (a) Arm_easy. (b) Arm_hard. (c) Pendulum. (d) Mountain car continuous. (e) Half cheetah. (f) Humanoid. (g) Hopper. (h) Walker2d.

Algorithm 1

Algorithm 1

The DN-DDPG process.

References

    1. Dörpinghaus M., É R., Neri I., Meyr H., Jülicher F. An information theoretic analysis of sequential decision-making. Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT); September 2017; Aachen, Germany. IEEE; pp. 3050–3054.
    1. Liu Q., Zhai J. W., Zhang Z.-Z., Zhong S., Zhou Q., Zhang P. A survey on deep reinforcement learning. Chinese Journal of Computers . 2018;41(1):1–27.
    1. Hasselt H. V., Wiering M. A. Using continuous action spaces to solve discrete problems. Proceedings of the International Joint Conference on Neural Networks; October 2009; Atlanta, GA, USA. pp. 1149–1156.
    1. Zhang W., Chen Q., Yan J., Zhang S., Xu J. A novel asynchronous deep reinforcement learning model with adaptive early forecasting method and reward incentive mechanism for short-term load forecasting. Energy . 2021;236 doi: 10.1016/j.energy.2021.121492.121492 -DOI
    1. Yang Y., Juntao L., Lingling P. Multi‐robot path planning based on a deep reinforcement learning DQN algorithm. CAAI Transactions on Intelligence Technology . 2020;5(3):177–183. doi: 10.1049/trit.2020.0024. -DOI

MeSH terms

LinkOut - more resources