检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吕惠林 董佳瑶 袁林 李利[2] LYU Huilin;DONG Jiayao;YUAN Lin;LI Li(CCTEG Changzhou Research Institute,Changzhou 213015,China;College of Electrical and Control Engineering,Xi'an University of Science and Technology,Xi'an 710054,China;Huoshaopu Coal Mine,Guizhou Panjiang Refined Coal Co.,Ltd.,Liupanshui 553000,China)
机构地区:[1]中煤科工集团常州研究院有限公司,江苏常州213015 [2]西安科技大学电气与控制工程学院,陕西西安710054 [3]贵州盘江精煤股份有限公司火烧铺矿,贵州六盘水553000
出 处:《工矿自动化》2025年第4期131-139,共9页Journal Of Mine Automation
基 金:国家重点研发计划项目(2023YFC3009800);陕西省教育厅科学研究计划项目(23JK0152);陕西省自然科学基础研究计划项目(2024JC-YBQN-0726,2023-JC-QN-0001);陕西省秦创原“科学家+工程师”队伍建设项目(2022KXJ-38)。
摘 要:命名实体识别是构建煤矿安全事故领域知识图谱的基本任务,但中文缺乏明显的词汇边界特征,导致现有实体提取模型对词汇信息利用不充分。针对上述问题,提出了一种融合词汇信息的煤矿安全事故实体提取模型——融合词汇信息的RoBERTa-BiLSTM-CRF模型。首先,构建煤矿安全领域专业词典,采用RoBERTa获取字符特征向量,采用AC自动机算法进行字词匹配,得到字符对应的潜在词汇,采用Glove获取词汇特征向量。然后,通过自注意机制分配权重,将基于RoBERTa得到的字符特征向量和基于GloVe得到的词汇特征向量进行融合,得到包含词汇信息的融合向量。最后,将融合向量作为BiLSTM-CRF的输入,得到最优预测序列结果,实现煤矿安全事故实体提取。实验结果表明:(1)融合词汇信息的RoBERTa-BiLSTM-CRF模型对煤矿安全领域12种实体提取的F_1达91.63%,较RoBERTa-BiLSTM-CRF模型提高了1.63%。(2)融合词汇信息的RoBERTa-BiLSTM-CRF模型在整体实体提取任务及各类实体类型的提取任务中,综合性能优于其他模型,说明模型架构设计对不同实体类型具有广泛适用性。Named Entity Recognition(NER)serves as a foundational task in constructing knowledge graphs for coal mine safety accidents,yet the absence of explicit lexical boundaries in Chinese text has constrained the effective utilization of lexical information by existing entity extraction models.To address this challenge,a RoBERTa-BiLSTM-CRF model integrated with lexical information was proposed for entity extraction in coal mine safety accidents.Initially,a domain-specific lexicon for coal mine safety was constructed,where character-level feature vectors were obtained via RoBERTa,and potential lexical units corresponding to characters were identified through the Aho-Corasick(AC)Automation.Subsequently,lexical feature vectors were derived using GloVe embeddings.These vectors were then fused via a self-attention mechanism,which dynamically allocated weights to integrate RoBERTa-based character features and GloVe-based lexical features,yielding a composite vector enriched with lexical semantics.Finally,the fused vector was fed into a BiLSTM-CRF framework to generate optimized prediction sequences,thereby achieving accurate entity extraction in coal mine safety accidents.Experimental results demonstrated that:(1)the proposed model achieved an F1-score of 91.63%,which was 1.63%higher than that of the RoBERTa-BiLSTM-CRF model.(2)It outperformed comparative models in both overall entity extraction tasks and across various entity categories,indicating the broad applicability of its design to diverse entity types.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7