检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王明达[1] 赵宝熙 吴志生 冷高强 WANG Mingda;ZHAO Baoxi;WU Zhisheng;LENG Gaoqiang(College of Mechanical and Electrical Engineering,China University of Petroleum,Qingdao Shandong 266580,China)
机构地区:[1]中国石油大学(华东)机电工程学院,山东青岛266580
出 处:《中国安全生产科学技术》2025年第2期139-145,共7页Journal of Safety Science and Technology
基 金:国家自然科学基金项目(52075549)。
摘 要:为解决样本稀少对大语言模型(LLM)在燃气事故调查报告中的实体识别精度影响显著的问题,提出1种基于两阶段训练的大语言模型实体识别方法。在数据集构建阶段,LLM根据对话式指令微调模板自动生成燃气事故调查报告数据集,采用简单数据增强(EDA)技术扩充人工标注的关键样本;在模型微调训练阶段,采用低秩适配微调技术对Phi3-mini-128k模型进行微调训练,第1阶段微调训练利用LLM自动标注数据集,在训练基础上利用增强数据集对模型进行第2阶段微调训练。研究结果表明:经过第1阶段微调训练后,Phi3-mini-rq模型的实体识别综合评价指标提高11.01百分点;当EDA增强数据占总数据的50%时,模型第2阶段微调效果最佳,综合评价指标值进一步提升2.49百分点。研究结果可为燃气领域的事故报告自动化处理提供有效技术支持。In order to solve the problem of the significant impact of sample scarcity on the entity recognition accuracy of large language model(LLM)in gas accident investigation reports,a LLM entity recognition method based on two-stage training was proposed.In the dataset construction stage,LLM automatically generates the dataset of gas accident investigation reports according to the conversational instruction fine-tuning template,and adopts simple data augmentation(EDA)technique to expand manually labeled key paper and then manually annotate it.In the model fine-tuning training stage,the low-rank adaptation fine-tuning technique was adopted to conduct the fine-tuning training on the Phi3-mini-128k model.The first-stage fine-tuning training utilized LLM to automatically annotate the dataset,and the second-stage fine-tuning training wad carried out on the model by using the enhanced dataset on the basis of training.The results show that after the first-stage fine-tuning training,the comprehensive evaluation index of entity recognition of Phi3-mini-rq model is improved by 11.01%.When the EDA enhanced data accounts for 50%of the total data,the second-stage fine-tuning effect of the model is the best,and the value of comprehensive evaluation index is further improved by 2.49%.The research results can provide effective technical support for the automated processing of accident reports in the gas field.
关 键 词:燃气事故调查报告 命名实体识别 大语言模型 指令微调 数据增强
分 类 号:X937[环境科学与工程—安全科学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222