This work was supported by the Program A for Outstanding PhD Candidate of Nanjing University,National Science Foundation of China(61876077);Jiangsu Science Foundation(BK20170013);Collaborative Innovation Center of Novel Software Technology and Industrialization.
Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments.In an unknown environment,the agent needs to explore the environment while exploiting the collected...