检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:贺馨仪 董明[1] 颜拥 姚影 黄建平 HE Xinyi;DONG Ming;YAN Yong;YAO Ying;HUANG Jianping(State Key Laboratory of Electrical Insulation for Power Equipment,Xi’an Jiaotong University,Xi’an 710049,Shaanxi Province,China;Electric Power Research Institute,State Grid Zhejiang Electric Power Co.,Ltd.,Hangzhou 310007,Zhejiang Province,China)
机构地区:[1]电力设备电气绝缘国家重点实验室西安交通大学,陕西省西安市710049 [2]国网浙江省电力有限公司电力科学研究院,浙江省杭州市310007
出 处:《电力信息与通信技术》2024年第11期52-59,共8页Electric Power Information and Communication Technology
基 金:国家电网有限公司总部科技项目资助“国家电网公司标准数字化实现路径及关键技术研究”(5700-202241437A-2-0-ZN)。
摘 要:近年来,电力行业高质量发展与数字化转型工作的重要性逐步凸显,对电力标准的数字化转型研究提出新的需求,也为电力标准的管理、实施和监督带来新的挑战和机遇。电力领域作为社会经济发展的重要支撑,其术语和专有名词具有很高的特定性和复杂性,传统的基于规则与特征工程的命名实体识别方法在处理电力领域的标准文档时存在识别准确率低、术语难分割、依赖专家经验的局限性。为了克服这些问题,文章提出改进BERT的命名实体识别模型,通过引入领域内的电力术语语料库、词特征与词汇信息,在电力标准语料上对10种电力实体进行识别,F1达到了81%,实现对于电力领域长术语实体的有效识别,提高电力标准文档的处理效率和准确性,为电力标准的信息处理和应用提供支持。通过文章的研究能够促进电力标准文档的自动化处理能力,提高电力行业的数字化水平,为电力行业的规范制定、知识管理和决策支持等方面提供有力的技术支撑。In recent years,the importance of high-quality development and digital transformation of the power industry has gradually become prominent,which puts forward new requirements for the digital transformation research of power standards,and also brings new challenges and opportunities for the management,implementation and supervision of power standards.As an important support for social and economic development,the terminology and proper nouns in the field of electric power have high specificity and complexity,and the traditional named entity recognition method based on rule and feature engineering has the limitations of low recognition accuracy,difficult to separate terms,and relying on expert experience when dealing with standard documents in the field of electric power.In order to overcome these problems,this paper proposes an improved BERT named entity recognition model.By introducing the power term corpus,word features and lexical information in the field,10 kinds of power entities are identified on the power standard corpus,and F1 reaches 81%,which realizes the effective identification of long term entities in the electric power field,improves the processing efficiency and accuracy of power standard documents,and provides support for the information processing and application of power standards.Through the research of this paper,it can promote the automatic processing ability of power standard documents,improve the digitalization level of the power industry,and provide strong technical support for the specification formulation,knowledge management and decision support of the power industry.
关 键 词:命名实体识别 标准数字化 自然语言处理 电力标准
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222