基于大语言模型的燃气事故调查报告实体识别  

Entity recognition of gas accident investigation reports based on large language model

在线阅读下载全文

作  者:王明达[1] 赵宝熙 吴志生 冷高强 WANG Mingda;ZHAO Baoxi;WU Zhisheng;LENG Gaoqiang(College of Mechanical and Electrical Engineering,China University of Petroleum,Qingdao Shandong 266580,China)

机构地区:[1]中国石油大学(华东)机电工程学院,山东青岛266580

出  处:《中国安全生产科学技术》2025年第2期139-145,共7页Journal of Safety Science and Technology

基  金:国家自然科学基金项目(52075549)。

摘  要:为解决样本稀少对大语言模型(LLM)在燃气事故调查报告中的实体识别精度影响显著的问题,提出1种基于两阶段训练的大语言模型实体识别方法。在数据集构建阶段,LLM根据对话式指令微调模板自动生成燃气事故调查报告数据集,采用简单数据增强(EDA)技术扩充人工标注的关键样本;在模型微调训练阶段,采用低秩适配微调技术对Phi3-mini-128k模型进行微调训练,第1阶段微调训练利用LLM自动标注数据集,在训练基础上利用增强数据集对模型进行第2阶段微调训练。研究结果表明:经过第1阶段微调训练后,Phi3-mini-rq模型的实体识别综合评价指标提高11.01百分点;当EDA增强数据占总数据的50%时,模型第2阶段微调效果最佳,综合评价指标值进一步提升2.49百分点。研究结果可为燃气领域的事故报告自动化处理提供有效技术支持。In order to solve the problem of the significant impact of sample scarcity on the entity recognition accuracy of large language model(LLM)in gas accident investigation reports,a LLM entity recognition method based on two-stage training was proposed.In the dataset construction stage,LLM automatically generates the dataset of gas accident investigation reports according to the conversational instruction fine-tuning template,and adopts simple data augmentation(EDA)technique to expand manually labeled key paper and then manually annotate it.In the model fine-tuning training stage,the low-rank adaptation fine-tuning technique was adopted to conduct the fine-tuning training on the Phi3-mini-128k model.The first-stage fine-tuning training utilized LLM to automatically annotate the dataset,and the second-stage fine-tuning training wad carried out on the model by using the enhanced dataset on the basis of training.The results show that after the first-stage fine-tuning training,the comprehensive evaluation index of entity recognition of Phi3-mini-rq model is improved by 11.01%.When the EDA enhanced data accounts for 50%of the total data,the second-stage fine-tuning effect of the model is the best,and the value of comprehensive evaluation index is further improved by 2.49%.The research results can provide effective technical support for the automated processing of accident reports in the gas field.

关 键 词:燃气事故调查报告 命名实体识别 大语言模型 指令微调 数据增强 

分 类 号:X937[环境科学与工程—安全科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象