面向不平衡数据的机械设备故障命名实体识别

Named Entity Recognition of Mechanical Equipment Failure for Imbalanced Data

作　　者：党小超刘涧董晓辉祝忠彦[2] 李芬芳 DANG Xiaochao;LIU Jian;DONG Xiaohui;ZHU Zhongyan;LI Fenfang(School of Computer Science and Engineering,Northwest Normal University,Lanzhou 730000,Gansu,China;Longshou Mine,Jinchuan Group Co.,Ltd.,Jinchang 737103,Gansu,China)

机构地区：[1]西北师范大学计算机科学与工程学院,甘肃兰州730000 [2]金川集团股份有限公司龙首矿,甘肃金昌737103

出　　处：《计算机工程》2024年第9期104-112,共9页Computer Engineering

基　　金：国家自然科学基金(62162056);甘肃省产业支撑计划(021CYZC-06)。

摘　　要：命名实体识别作为构建知识图谱的基础任务,其识别效果直接影响知识图谱的质量。在实际生产中,机械故障数据通常包含大量的领域专业词汇,同时实体类型普遍存在分布不平衡的问题,这对准确识别故障实体构成了挑战。通用领域实体识别方法在这一领域效果欠佳,从而降低了知识图谱的质量。为应对上述问题,提出一种融合焦点损失(Focal Loss)函数和专业词典的实体识别方法。该方法使用Focal Loss函数应对实体类型不平衡问题,通过引入平衡因子和调制系数,改进传统的交叉熵损失函数,提升实体识别效果,同时将领域专业词汇嵌入到模型中,进一步提高实体识别性能,这一词典包含机械故障的领域术语,有助于模型更准确地识别机械设备故障命名实体。在自建的矿井提升机实验数据集上进行广泛实验验证,结果证明,融入Focal Loss后模型的F1值比主流模型BERT-BiLSTM-CRF提高了5.57个百分点,相比用于解决数据不平衡的典型方法SMOTE效果更优,在此基础上,通过嵌入领域词典,模型的F1值得到进一步提升,达到89.13%。Named Entity Recognition(NER)is a fundamental task in building knowledge graphs and directly affects graph quality.However,in practice,mechanical failure data often contain a significant amount of domain-specific vocabulary,and in general,an imbalance exists in the distribution of entity types.Thus,existing NER methods in general domains do not yield satisfactory results.To address these problems,this paper proposes an entity recognition method that integrates a Focal Loss function into domain-specific dictionaries.This method improves the cross-entropy loss function by introducing balancing and modulation coefficients for sample distributions.In addition,entity recognition is enhanced through the fusion of vocabulary features.Experimental results on a self-built dataset of mining hoist machines show that the incorporation of Focal Loss increases the F1 value by 5.57 percentage points compared with the mainstream Bidirectional Encoder Representations from Transformers(BERT)-Bidirectional Long-Short-Term Memory(BiLSTM)-Conditional Random Field(CRF)model.Furthermore,it outperforms the typical Synthetic Minority Over-sampling Technique(SMOTE)method in solving imbalanced data issues.By incorporating domain dictionaries,the F1 value is further improved,reaching 89.13%.

关键词：命名实体识别不平衡数据焦点损失函数机械设备故障双向长短期记忆网络条件随机场

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向不平衡数据的机械设备故障命名实体识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向不平衡数据的机械设备故障命名实体识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索