基于预训练模型与知识蒸馏的法律判决预测算法  被引量:12

Legal judgment prediction based on pre-training model and knowledge distillation

在线阅读下载全文

作  者:潘瑞东 孔维健[1,2] 齐洁 PAN Rui-dong;KONG Wei-jian;QI Jie(School of Information Science and Technology,Donghua University,Shanghai 201600,China;Engineering Research Center of Digitized Textile and Fashion Technology of Ministry Education,Donghua University,Shanghai 201620,China)

机构地区:[1]东华大学信息科学与技术学院,上海201600 [2]东华大学数字化纺织服装技术教育部工程研究中心,上海201620

出  处:《控制与决策》2022年第1期67-76,共10页Control and Decision

基  金:国家自然科学基金项目(61773112,61603088)。

摘  要:针对法律判决预测中罪名预测和法条推荐子任务,提出基于BERT(bidirectional encoder representation from transformers)预训练模型与知识蒸馏策略的多任务多标签文本分类模型.为挖掘子任务间的关联,提高预测准确率,运用BERT预训练模型进行多任务学习,建立BERT;multi文本分类模型;针对罪名、法条类别中的样本不均衡问题,采用分组的焦点损失(focal loss)以增强模型对于罕见罪名及法条的辨别能力;为降低模型计算复杂度并且提高模型推理速度,提出一种以教师模型评价为参考的知识蒸馏策略,通过动态平衡蒸馏中的蒸馏损失和分类损失,将BERT;multi压缩为浅层结构的学生模型.综上,构建出可以处理不均衡样本且具有较高推理速度的多任务多标签文本分类模型BERT;multi.在CAIL2018数据集上的实验表明:采用预训练模型及分组focal loss可显著提高法律判决预测的性能;通过融入教师模型评价,知识蒸馏得到的学生模型推理速度提高近一倍,并且在罪名预测及法条推荐任务中获得86.7%与83.0%的F;-Score(Micro-F;与Macro-F;的均值).Based on the bidirectional encoder representation from transformers(BERT)pre-training model and knowledge distillation,a multi-task and multi-label text classification model is proposed for two sub-tasks of the legal judgment prediction,namely,charge prediction and law article recommendation.To find the correlation between two sub-tasks and improve the performance of prediction,a text classification model named BERT;multi is formulated by multitask learning based on a BERT pre-training model.The hierarchical focal loss is introduced to improve the ability of distinguishing the charges and law articles,which are sampled imbalanced.In order to reduce the computing complexity and increase the speed of the inference,we propose a knowledge distillation strategy based on the evaluation of the teacher model.The strategy compresses BERT;multi into a student model with a shallow structure by balancing between the classification loss and the distillation loss dynamically.Hence,a multi-task and multi-label text classification model with higher inference speed named BERT;multi is introduced,which can deal with the imbalance problem of samples.Experiments on the CAIL2018 dataset show that the pre-training model and hierarchical focal loss can improve the performance of the prediction algorithm effectively.Combined with our knowledge distillation strategy,the inference speed of the student model is nearly doubled.The F;-Scores(mean value of Micro-F;and Macro-F;)for charge prediction and law article recommendation are 86.7%and 83.0%.

关 键 词:法律判决预测 预训练模型 焦点损失 多任务学习 模型压缩 知识蒸馏 

分 类 号:TP273[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象