检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴鹏[1,2] 魏上清 董嘉鹏 潘理 WU Peng;WEI Shang-qing;DONG Jia-peng;PAN Li(School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai,200240 China;National Engineering Laboratory for Information Content Analysis Technology,Shanghai 200240,China)
机构地区:[1]上海交通大学电子信息与电气工程学院,上海200240 [2]信息内容分析技术国家工程实验室,上海200240
出 处:《计算机技术与发展》2022年第9期82-88,共7页Computer Technology and Development
基 金:国家自然科学基金(62002219);上海市扬帆计划项目(19YF1424700)。
摘 要:为对法官员额资源进行调度优化,平衡司法资源有限和现实司法需求之间的矛盾,该文建立审判人力资源调度优化模型,提出基于强化学习的审判团队调度优化策略。基于对审判人员调度问题和场景的分析,建立以案件的平均处理时间最小化为优化目标的审判人员调度优化数学模型以及相应的约束条件。在此基础上建立宏观的司法系统排队模型,定义审判人力资源调度马尔可夫决策过程,并基于状态/动作/奖励/状态/动作(Sate-Action-Reward-State-Action,SARSA)算法提出动态自适应的审判人员调度强化学习算法。该算法以案件的平均处理时间为奖励,通过贪婪行为策略选择调度策略,采用时序差分更新方法在与司法系统交互的过程中学习最优调度策略。相比于传统分案方法及其他基于规则的简单启发式算法,该算法能够提高案件审判效率、优化人力资源配置。In order to optimize the scheduling of legal officials and balance the contradiction between the limited judicial resources and the actual judicial needs,a trial human resource scheduling optimization model and the trial team scheduling optimization strategy based on reinforcement learning are proposed.On the basis of analysis of the judiciary scheduling problems and scenarios,a mathematical model of judiciary scheduling optimization with the optimization goal of minimizing the average processing time of the case is established.On this basis,a macroscopic judicial system queuing model is established,the Markov decision-making process of trial human resource scheduling is defined,and a dynamic adaptive reinforcement learning algorithm for judicial personnel scheduling based on SARSA(Sate-Action-Reward-State-Action)is proposed.The algorithm uses the average processing time of the case as a reward,selects the scheduling strategy through the greedy behavior strategy,and uses the time-series differential update method to learn the optimal scheduling strategy in the process of interacting with the judicial system.Compared with the traditional division method and other simple rule-based heuristic algorithms,the proposed algorithm can improve the efficiency of case trials and optimize the allocation of human resources.
关 键 词:强化学习 资源调度 决策优化 贪婪策略 马尔可夫决策过程
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222