基于深度强化学习算法的双边装配线第一类平衡

Deep reinforcement learning algorithm for the type I two-sided assembly line balancing problem

作　　者：程玮张亚辉[2] 曹先锋金增志胡小锋[1] CHENG Wei;ZHANG Yahui;CAO Xianfeng;JIN Zengzhi;HU Xiaofeng(School of Mechanical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China;Institute of Marine Equipment,Shanghai Jiao Tong University,Shanghai 200240,China;Process Research Institution,China National Heavy Duty Truck Group Co.,Ltd,Jinan 250100,China)

机构地区：[1]上海交通大学机械与动力工程学院,上海200240 [2]上海交通大学海洋装备研究院,上海200240 [3]中国重汽集团工艺研究院,山东济南250100

出　　处：《计算机集成制造系统》2024年第2期508-519,共12页Computer Integrated Manufacturing Systems

基　　金：国家自然科学基金资助项目(51975373);上海交通大学新进青年教师启动计划资助项目(22X010503668)。

摘　　要：针对传统优化算法求解双边装配线第一类平衡问题时不能有效利用历史求解经验,难以得到最优解,提出一种深度强化学习求解算法CNN-PPO。设计了CNN-PPO强化学习智能体结构,在近端策略优化算法基础上,引入卷积神经网络增强智能体的数据特征提取能力;根据双边装配线问题特征,定义状态矩阵对双边装配线问题进行描述,并引入标记层辅助智能体进行任务决策;根据问题优化目标设计了奖励函数,结合强化学习在线执行—评价机制,为每次决策选择最优的待分配任务,并通过多个案例测试验证了算法的有效性和稳定性。实验结果表明,所提方法的求解结果具有优越性,59个测试案例中有57个可以达到下界。The traditional optimization algorithm cannot effectively use historical solving experience and is difficult to obtain the optimal solution when solving the type I two-sided assembly line balancing problem.Aiming at this problem,a deep reinforcement learning algorithm named Proximal Policy Optimization with Convolutional Neural Networks(CNN-PPO)was proposed.The deep reinforcement learning agent structure of the CNN-PPO was designed.Based on the Proximal Policy Optimization(PPO),the Convolutional Neural Networks(CNN)was introduced to enhance the data feature extraction capabilities of the agent.According to the characteristics of two-sided assembly line balancing,a state matrix was proposed to describe the two-sided assembly line balancing problem and introduce the mask layer to assist the agent in task decision-making.A reward function was designed according to the optimization goal,the optimal combination behavior strategy was selected for each decision by combining with the reinforcement learning online execution-evaluation(Actor-Critic)mechanism,and the effectiveness and stability of the algorithm were verified through multiple example tests.The experimental results showed that the solution results of the proposed algorithm were better than the current algorithms,of which 57 could reach the lower bound among 59 test cases.

关键词：双边装配线第一类平衡问题深度强化学习卷积神经网络近端策略优化

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习算法的双边装配线第一类平衡

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习算法的双边装配线第一类平衡

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索