基于深度强化学习的大规模敏捷软件项目调度  被引量:1

Large-scale Agile Software Project Scheduling Based on Deep Reinforcement Learning

在线阅读下载全文

作  者:申晓宁[1,2,3,4] 毛鸣健 沈如一 宋丽妍 SHEN Xiaoning;MAO Mingjian;SHEN Ruyi;SONG Liyan(School of Automation,Nanjing University of Information Science and Technology,Nanjing 210044,China;Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology,Nanjing University of Information Science and Technology,Nanjing 210044,China;Jiangsu Key Laboratory of Big Data Analysis Technology,Nanjing University of Information Science and Technology,Nanjing 210044,China;Jiangsu Engineering Research Center on Meteorological Energy Using and Control(C-MEIC),Nanjing 210044,China;Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation,Southern University of Science and Technology,Shenzhen 518055,China)

机构地区:[1]南京信息工程大学自动化学院,江苏南京210044 [2]南京信息工程大学江苏省大气环境与装备技术协同创新中心,江苏南京210044 [3]南京信息工程大学江苏省大数据分析技术重点实验室,江苏南京210044 [4]江苏省气象能源利用与控制工程技术研究中心,江苏南京210044 [5]南方科技大学广东省类脑智能计算重点实验室,广东深圳518055

出  处:《郑州大学学报(工学版)》2023年第5期17-23,共7页Journal of Zhengzhou University(Engineering Science)

基  金:国家自然科学基金资助项目(61502239,62002148);广东省重点实验室项目(2020B121201001);江苏省自然科学基金资助项目(BK20150924)。

摘  要:为解决大规模敏捷软件项目调度问题,首先,将其分解为故事选择、故事分配和任务分配3个强耦合子问题,并引入用户故事的新增与删除、每个冲刺阶段中员工工作时长的变化等动态事件,考虑团队开发速度、任务时长和技能等约束,以最大化项目所完成用户故事总价值为目标建立大规模敏捷软件项目调度数学模型;其次,根据问题特征设计了马尔可夫决策过程,采用10个状态特征描述每个冲刺阶段开始时的敏捷调度环境,12个复合调度规则作为智能体的候选动作,并按照调度模型的目标函数定义奖励;最后,提出一种基于复合调度规则的优先经验回放双重深度Q网络算法来求解所建模型,引入双重深度Q网络(DDQN)策略和优先经验回放策略,避免深度Q网络的过估计问题,并提高经验回放池中轨迹信息的利用效率。为了验证所提算法的有效性,在6个大规模敏捷软件项目调度算例中进行了实验,分析了所提算法的收敛性。根据算法性能测度,与已有代表性算法DQN、双重深度Q网络以及仅使用单一复合调度规则的方法进行对比。结果表明:所提算法在6个不同算例中均获得了最高的平均累计奖励值。In order to This study aimed to solve the scheduling problem of large-scale agile software project.It was decomposed into three strong-coupled subproblems:story selection,story allocation and task allocation.Dynamic events such as the addition and deletion of user stories,the change of employee′s working hours in each sprint,and other constraints such as team development speed,task duration and skills were introduced.To maximize the total value of user stories completed by the project,a large-scale agile software project scheduling mathematical model was established.According to the characteristics of the problem,the Markov decision process was designed.Ten state features were used to describe the agile scheduling environment at the beginning of each sprint;12 composite scheduling rules were designed as candidate actions of the agent;and rewards were defined according to the objective function of the scheduling model.A priority experience replay double deep Q network algorithm(CPDDQN)based on composite scheduling rules was proposed to solve the built model.The double Q network strategy and priority experience replay strategy were introduced to avoid the over-estimation problem of deep Q network and improve the utilization efficiency of trajectory information in the experience replay pool.In order to verify the effectiveness of the proposed algorithm,experiments were carried out in six large-scale agile software project scheduling numerical examples,and the convergence of the proposed algorithm was analyzed.According to the performance measurement of the algorithm,it was compared with the existing representative algorithm DQN,double deep Q network and 12 single composite scheduling rules.The results showed that it CPDDQN had the highest average cumulative reward value in 6 different numerical examples.

关 键 词:强化学习 大规模 敏捷软件项目调度 深度Q网络 复合调度规则 优先经验回放 强耦合 

分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论] TP301.6[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象