检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈斌[1] 刘卫国[2] CHEN Bin;LIU Weiguo(School of Automation,Central South University,Changsha 410083,China;School of Computer Science and Engineering,Central South University,Changsha 410083,China)
机构地区:[1]中南大学自动化学院,长沙410083 [2]中南大学计算机学院,长沙410083
出 处:《计算机科学与探索》2021年第9期1680-1693,共14页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金(61073187)。
摘 要:遗传算法(GA)的全局搜索能力强,易于操作,但其收敛速度慢,易陷入局部最优值。针对以上问题,利用深度强化学习模型SAC对遗传算法进行改进,并将其应用至旅行商问题(TSP)的求解。改进算法将种群作为与智能体(agent)交互的环境,引入贪心算法对环境进行初始化,使用改进后的交叉与变异运算作为agent的动作空间,将种群的进化过程视为一个整体,以最大化种群进化过程的累计奖励为目标,结合当前种群个体适应度情况,采用基于SAC的策略梯度算法,生成控制种群进化的动作策略,合理运用遗传算法的全局和局部搜索能力,优化种群的进化过程,平衡种群收敛速度与遗传操作次数之间的关系。对TSPLIB实例的实验结果表明,改进的遗传算法可有效地避免陷入局部最优解,在提高种群收敛速度的同时,减少寻优过程的迭代次数。Genetic algorithm(GA)has strong global searching ability and is easy to operate,but its disadvantages such as poor convergence speed,unstable and easy to fall into local optimal value restrict its application.In order to overcome these disadvantages,an improved genetic algorithm based on the deep reinforcement learning model SAC(soft actor-critic)is proposed in this paper,which is applied to the resolution of traveling salesman problem(TSP).The improved algorithm regards the population as agent.s interaction environment,meanwhile greedy algorithm is used to initialize this environment for improving the quality of initial populations.For controlling the evolution of the population,the improved crossover and mutation operations are used as agent.s action space.With the goal of maximizing the cumulative rewards of population evolution,the improved algorithm treats the evolution of the population as a whole and uses a policy gradient algorithm based on SAC to generate evolution controlling action strategy combined with the current individual fitness of the population.The action strategy reasonably uses the global and local search ability of genetic algorithm by agent.s actions,optimizing the evolutionary process of the population while balancing relationship between the population convergence rate and the times of genetic operation.The experimental results of TSPLIB indicate that the improved genetic algorithm can effectively avoid falling into the local optimal solution and reduce the number of iteration in the optimization process while improving the convergence rate of the population.
关 键 词:强化学习 遗传算法(GA) 旅行商问题(TSP) 深度策略梯度 soft actor-critic(SAC)模型
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249