融合词汇信息的煤矿安全事故实体提取  

Entity extraction integrating lexical information for coal mine safety accidents

在线阅读下载全文

作  者:吕惠林 董佳瑶 袁林 李利[2] LYU Huilin;DONG Jiayao;YUAN Lin;LI Li(CCTEG Changzhou Research Institute,Changzhou 213015,China;College of Electrical and Control Engineering,Xi'an University of Science and Technology,Xi'an 710054,China;Huoshaopu Coal Mine,Guizhou Panjiang Refined Coal Co.,Ltd.,Liupanshui 553000,China)

机构地区:[1]中煤科工集团常州研究院有限公司,江苏常州213015 [2]西安科技大学电气与控制工程学院,陕西西安710054 [3]贵州盘江精煤股份有限公司火烧铺矿,贵州六盘水553000

出  处:《工矿自动化》2025年第4期131-139,共9页Journal Of Mine Automation

基  金:国家重点研发计划项目(2023YFC3009800);陕西省教育厅科学研究计划项目(23JK0152);陕西省自然科学基础研究计划项目(2024JC-YBQN-0726,2023-JC-QN-0001);陕西省秦创原“科学家+工程师”队伍建设项目(2022KXJ-38)。

摘  要:命名实体识别是构建煤矿安全事故领域知识图谱的基本任务,但中文缺乏明显的词汇边界特征,导致现有实体提取模型对词汇信息利用不充分。针对上述问题,提出了一种融合词汇信息的煤矿安全事故实体提取模型——融合词汇信息的RoBERTa-BiLSTM-CRF模型。首先,构建煤矿安全领域专业词典,采用RoBERTa获取字符特征向量,采用AC自动机算法进行字词匹配,得到字符对应的潜在词汇,采用Glove获取词汇特征向量。然后,通过自注意机制分配权重,将基于RoBERTa得到的字符特征向量和基于GloVe得到的词汇特征向量进行融合,得到包含词汇信息的融合向量。最后,将融合向量作为BiLSTM-CRF的输入,得到最优预测序列结果,实现煤矿安全事故实体提取。实验结果表明:(1)融合词汇信息的RoBERTa-BiLSTM-CRF模型对煤矿安全领域12种实体提取的F_1达91.63%,较RoBERTa-BiLSTM-CRF模型提高了1.63%。(2)融合词汇信息的RoBERTa-BiLSTM-CRF模型在整体实体提取任务及各类实体类型的提取任务中,综合性能优于其他模型,说明模型架构设计对不同实体类型具有广泛适用性。Named Entity Recognition(NER)serves as a foundational task in constructing knowledge graphs for coal mine safety accidents,yet the absence of explicit lexical boundaries in Chinese text has constrained the effective utilization of lexical information by existing entity extraction models.To address this challenge,a RoBERTa-BiLSTM-CRF model integrated with lexical information was proposed for entity extraction in coal mine safety accidents.Initially,a domain-specific lexicon for coal mine safety was constructed,where character-level feature vectors were obtained via RoBERTa,and potential lexical units corresponding to characters were identified through the Aho-Corasick(AC)Automation.Subsequently,lexical feature vectors were derived using GloVe embeddings.These vectors were then fused via a self-attention mechanism,which dynamically allocated weights to integrate RoBERTa-based character features and GloVe-based lexical features,yielding a composite vector enriched with lexical semantics.Finally,the fused vector was fed into a BiLSTM-CRF framework to generate optimized prediction sequences,thereby achieving accurate entity extraction in coal mine safety accidents.Experimental results demonstrated that:(1)the proposed model achieved an F1-score of 91.63%,which was 1.63%higher than that of the RoBERTa-BiLSTM-CRF model.(2)It outperformed comparative models in both overall entity extraction tasks and across various entity categories,indicating the broad applicability of its design to diverse entity types.

关 键 词:煤矿安全事故 实体提取 词汇信息 本体模型 实体标注 命名实体识别 

分 类 号:TD67[矿业工程—矿山机电]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象