融合BERT、双向长短记忆网络和条件随机场的电力设备缺陷文本实体抽取  被引量:9

Text Entity Extraction of Power Equipment Defects Based on BERT-BI-LSTM-CRF Algorithm

在线阅读下载全文

作  者:陈鹏 邰彬 石英[3] 金杨 孔力 汪进锋 CHEN Peng;TAI Bin;SHI Ying;JIN Yang;KONG Li;WANG Jinfeng(Electric Power Research Institute of Guangdong Power Grid Co.,Ltd.,Guangzhou 510080,Guangdong Province,China;Key Laboratory of Power Equipment Reliability Enterprises in Guangdong Province,Guangzhou 510080,Guangdong Province,China;School of Automation,Wuhan University of Technology,Wuhan 430070,Hubei Province,China)

机构地区:[1]广东电网有限责任公司电力科学研究院,广东省广州市510080 [2]广东省电力装备可靠性企业重点实验室,广东省广州市510080 [3]武汉理工大学自动化学院,湖北省武汉市430070

出  处:《电网技术》2023年第10期4367-4375,共9页Power System Technology

基  金:南方电网公司科技项目(036100KK52200021(GDKJXM20200443))。

摘  要:随着智能电网建设的全面展开,产生了大量与设备缺陷相关的电力设备缺陷文本,蕴含着故障类型、故障原因及设备消缺方法等关键信息,是电力领域的研究热点。但缺陷文本存在着体量大、多源异构和内容杂乱冗余的问题,目前缺乏对其进行高效整合利用的方法。针对以上问题,该文基于BERT(bidirectional encoder representation from transformers)模型对命名实体抽取技术展开研究。一方面,增加了双向长短期记忆(bi-directional long short-term memory,Bi-LSTM)层进一步提取文本语义信息;另一方面,采用条件随机场(conditional random field,CRF)替换了BERT的输出层,克服了预测标签的局部最优问题。最后融合以上2种策略提出了改进BERT算法,即将BERT与双向长短记忆网络和条件随机场相结合,实现了缺陷文本的命名实体抽取。实验结果表明,改进BERT算法在7类实体上均取得了较高的F1值(精确率和召回率的加权调和平均值)。与BERT相比,实体抽取的总体精确率和召回率分别提升了0.94%和0.95%。With the development of the smart grid construction,a large number of power equipment defect texts have been generated,which contains a lot of key information such as the fault types,the fault causes and the equipment defect elimination methods,which is a research hotspot in the field of electric power.However,these defective texts are large in volume,multi-source in heterogeneity,and cluttered and redundant in content,and there is currently no proper method for its efficient integration and utilization.In view of the above problems,this paper studies the named entity recognition technology based on the BERT model.On the one hand,a BI-LSTM layer is added to further extract the textual semantic information,on the other hand,the CRF is used to replace the output layer of the BERT,which overcomes the local optimal problem of the predicting labels.Finally,combining the above two strategies,an improved BERT algorithm is proposed,which realizes the named entity recognition of the defective texts.The experimental results show that the improved BERT algorithm achieves higher F1 values on 7 types of entities.Compared with the single BERT,the overall precision and the recall of entity extraction are improved by 0.94%and 0.95%,respectively.

关 键 词:电力设备缺陷文本 命名实体抽取 改进BERT算法 语义信息 输出层 局部最优 

分 类 号:TM721[电气工程—电力系统及自动化]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象