基于知识增强的命名实体识别方法研究  被引量:1

Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement

在线阅读下载全文

作  者:高翔 唐积强[3] 朱俊武[1] 梁明轩[1,2] 李阳 GAO Xiang;TANG Jiqiang;ZHU Junwu;LIANG Mingxuan;LI Yang(College of Information Engineering,Yangzhou University,Yangzhou,Jiangsu 225000,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;National Computer Network Emergency Response Technical Team/Coordination Center of China,Beijing 100029,China)

机构地区:[1]扬州大学信息工程学院,江苏扬州225000 [2]中国科学院计算技术研究所,北京100190 [3]国家计算机网络应急技术处理协调中心,北京100029

出  处:《计算机科学》2023年第S01期102-107,共6页Computer Science

基  金:国家242信息安全计划项目(2021A008);北京市科技新星计划交叉学科合作课题(Z191100001119014);国家重点研发计划重点专项(2017YFC1700300,2017YFB1002300);国家自然科学基金(61702234)。

摘  要:命名实体识别作为自然语言处理中一项十分基础的任务,其目的是从一段用自然语言描述的文本中识别出相应的实体及类型。知识图谱作为以三元组形式存在的外部知识,已经在很多自然语言处理任务中得以应用并取得了良好效果。文中提出了一种基于知识图谱信息增强的注意力对齐命名实体识别方法,首先通过嵌入层和注意力机制嵌入知识图谱信息,获取知识图谱三元组信息的表示;其次通过BERT-BiLSTM获取句子的上下文表示;然后通过一种注意力对齐模块分配三元组权重融合知识图谱信息与句子信息的表示;最后通过softmax控制融合后的表示向量的预测输出,进而获取实体的标签。该方法有效避免了因知识图谱的融合而改变原句子的语义信息,同时也使得句子中的词向量具有丰富的外部知识。所提方法在中文通用数据集MSRA和医疗领域专用数据集Medicine上的F1值分别达到了95.73%和93.80%,相比基线模型提升了1.21%和1.3%。Named entity recognition is a very basic task in natural language processing,and its purpose is to identify the corresponding entities and types from a text described in natural language.As external knowledge in the form of triples,knowledge graphs have been applied in many natural language processing tasks and achieved good results.This paper proposes an attention-aligned named entity recognition method based on knowledge graph information enhancement.Firstly,the knowledge graph information is embedded through the embedding layer and attention mechanism to obtain the representation of the knowledge graph triple information.Secondly,the sentence is obtained through BERT-BiLSTM.Then,an attention alignment module is used to assign triple weights to fuse the representation of knowledge graph information and sentence information.Finally,the prediction output of the fused representation vector is controlled by softmax,and the label of the entity is obtained.This method effectively avoids changing the semantic information of the original sentence due to the fusion of knowledge graphs,and also enables the word vectors in the sentence to have rich external knowledge.The proposedmethod achieves F1 values of 95.73%and 93.80%on the Chinese general data set MSRA and the medical domain specific data set Medicine,respectively,achieving advanced perfor-mance.

关 键 词:命名实体识别 知识图谱增强 注意力机制 深度学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象