Natural Science Foundation of Zhejiang Province,Grant/Award Number:LQ15F030006;Key Research and Development Program of Zhejiang Province,Grant/Award Number:2018C01085。
Asynchronous advantage actor‐critic(A3C)algorithm is a commonly used policy opti-mization algorithm in reinforcement learning,in which asynchronous is parallel inter-active sampling and training,and advantage is a sa...