基于SARSA强化学习的审判人力资源调度方法被引量：3

Trial Human Resources Scheduling Method Based on SARSA Reinforcement Learning

作　　者：吴鹏[1,2] 魏上清董嘉鹏潘理 WU Peng;WEI Shang-qing;DONG Jia-peng;PAN Li(School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai,200240 China;National Engineering Laboratory for Information Content Analysis Technology,Shanghai 200240,China)

机构地区：[1]上海交通大学电子信息与电气工程学院,上海200240 [2]信息内容分析技术国家工程实验室,上海200240

出　　处：《计算机技术与发展》2022年第9期82-88,共7页Computer Technology and Development

基　　金：国家自然科学基金(62002219);上海市扬帆计划项目(19YF1424700)。

摘　　要：为对法官员额资源进行调度优化,平衡司法资源有限和现实司法需求之间的矛盾,该文建立审判人力资源调度优化模型,提出基于强化学习的审判团队调度优化策略。基于对审判人员调度问题和场景的分析,建立以案件的平均处理时间最小化为优化目标的审判人员调度优化数学模型以及相应的约束条件。在此基础上建立宏观的司法系统排队模型,定义审判人力资源调度马尔可夫决策过程,并基于状态/动作/奖励/状态/动作(Sate-Action-Reward-State-Action,SARSA)算法提出动态自适应的审判人员调度强化学习算法。该算法以案件的平均处理时间为奖励,通过贪婪行为策略选择调度策略,采用时序差分更新方法在与司法系统交互的过程中学习最优调度策略。相比于传统分案方法及其他基于规则的简单启发式算法,该算法能够提高案件审判效率、优化人力资源配置。In order to optimize the scheduling of legal officials and balance the contradiction between the limited judicial resources and the actual judicial needs,a trial human resource scheduling optimization model and the trial team scheduling optimization strategy based on reinforcement learning are proposed.On the basis of analysis of the judiciary scheduling problems and scenarios,a mathematical model of judiciary scheduling optimization with the optimization goal of minimizing the average processing time of the case is established.On this basis,a macroscopic judicial system queuing model is established,the Markov decision-making process of trial human resource scheduling is defined,and a dynamic adaptive reinforcement learning algorithm for judicial personnel scheduling based on SARSA(Sate-Action-Reward-State-Action)is proposed.The algorithm uses the average processing time of the case as a reward,selects the scheduling strategy through the greedy behavior strategy,and uses the time-series differential update method to learn the optimal scheduling strategy in the process of interacting with the judicial system.Compared with the traditional division method and other simple rule-based heuristic algorithms,the proposed algorithm can improve the efficiency of case trials and optimize the allocation of human resources.

关键词：强化学习资源调度决策优化贪婪策略马尔可夫决策过程

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于SARSA强化学习的审判人力资源调度方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于SARSA强化学习的审判人力资源调度方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于SARSA强化学习的审判人力资源调度方法被引量：3