Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning  

在线阅读下载全文

作  者:Yameng Yin Lieping Zhang Xiaoxu Shi Yilin Wang Jiansheng Peng Jianchu Zou 

机构地区:[1]Key Laboratory of Advanced Manufacturing and Automation Technology,Guilin University of Technology,Education Department of Guangxi Zhuang Autonomous Region,Guilin,541006,China [2]Guangxi Key Laboratory of Special Engineering Equipment and Control,Guilin University of Aerospace Technology,Guilin,541004,China [3]Guilin Mingfu Robot Technology Company Limited,Guilin,541199,China [4]Key Laboratory of AI and Information Processing,Education Department of Guangxi Zhuang Autonomous Region,Hechi University,Yizhou,546300,China

出  处:《Computers, Materials & Continua》2024年第11期2769-2790,共22页计算机、材料和连续体(英文)

基  金:funded by National Natural Science Foundation of China(No.62063006);Guangxi Science and Technology Major Program(No.2022AA05002);Key Laboratory of AI and Information Processing(Hechi University),Education Department of Guangxi Zhuang Autonomous Region(No.2022GXZDSY003);Central Leading Local Science and Technology Development Fund Project of Wuzhou(No.202201001).

摘  要:By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation.

关 键 词:Double Deep Q Network path planning average Q-value estimation reward redistribution mechanism reward-prioritized experience selection method 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP242[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象