基于大语言模型的公安专业小样本知识抽取方法研究被引量：1

Research on Public Security Professional Small Sample Knowledge Extraction Method Based on Large Language Model

作　　者：裴炳森李欣[1] 蒋章涛刘明帅 PEI Bingsen;LI Xin;JIANG Zhangtao;LIU Mingshuai(School of Information and Cyber Security,People's Public Security University of China,Beijing 100038,China)

机构地区：[1]中国人民公安大学信息网络安全学院,北京100038

出　　处：《计算机科学与探索》2024年第10期2630-2642,共13页Journal of Frontiers of Computer Science and Technology

基　　金：国家重点研发计划(2022300070005)。

摘　　要：当前公安业务工作信息化、数字化飞速发展,在公安工作中产生了大量执法办案数据,但是其文本种类较多、信息量较大,导致一线民警在阅卷工作中常面临阅读效率低、信息难以聚合等问题。为更进一步利用执法办案文本,需要对其进行智能分析、知识抽取,但受限于公安专业执法办案文本的专业性、数据敏感性、保密性,以及公安数据出网要求等,仅能获取到少量学习训练样本,使用传统的深度学习模型抽取效果不尽如人意。因此提出使用较少资源和数据构建垂直领域大语言模型,实现模型对公安专业适配的方法,利用知识编辑技术MEMIT、低资源微调技术LoRA、提示模板,提高模型对警务术语、警务常识等公安知识的理解能力。为进一步提高模型的知识抽取效果,设计小样本执法办案文本数据抽取流程,以更好结合模型中的相关案别专业知识。实验结果表明,融合抽取流程的公安专业垂直领域大语言模型在各类知识抽取任务中准确率较之传统方法显著提高,有助于帮助一线民警快速、客观、准确分析执法办案文本,挖掘案件潜在信息,支撑公安工作智能化发展。The rapid development of informatization and digitalization in public security business has generated a large amount of law enforcement case data in public security work.However,due to various types of text and large amount of information,front-line police officers often face problems such as low reading efficiency and difficulty in aggregating information in the process of reading case files.In order to further utilize the law enforcement case text,it is necessary to conduct intelligent analysis and knowledge extraction.However,due to the professionalism,data sensitivity,confidentiality of public security professional law enforcement case text,as well as the requirements of public security data going out of the network,only a small number of learning training samples can be obtained,and the traditional deep learning model has unsatisfactory extraction effect.Therefore,this paper proposes to build a large language model in vertical fields with fewer resources and data,and realize the adaptation of the model to the public security profession.The model uses knowledge editing technology MEMIT(mess-editing memory in a trans-former),low-resource fine-tuning technology LoRA(low-rank adaptation),and prompt templates to improve the model's understanding of public security knowledge such as police terminology and common sense.Moreover,in order to further improve the knowledge extraction effect of the model,a small sample law enforcement case text data extraction process is designed to better integrate the professional knowledge related to the case in the model.Experimental results show that the accuracy of the public security professional vertical field large language model integrated with the extraction process in various knowledge extraction tasks is significantly improved compared with the traditional methods,which helps front-line police officers quickly,objectively and accurately analyze law enforcement case text,dig out potential case information,and support the intelligent development of public security work.

关键词：大语言模型知识抽取小样本数据公安执法办案

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于大语言模型的公安专业小样本知识抽取方法研究被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于大语言模型的公安专业小样本知识抽取方法研究 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于大语言模型的公安专业小样本知识抽取方法研究被引量：1