结合上下文词汇匹配和图卷积的材料数据命名实体识别

Material data named entity recognition based on matching contextual lexical words and graph convolution

作　　者：陈茜武星[1,2,3] CHEN Qian;WU Xing(School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China;Center of Materials Informatics and Data Science,Materials Genome Institute,Shanghai University,Shanghai 200444,China;Zhejiang Laboratory,Hangzhou 311100,Zhejiang,China)

机构地区：[1]上海大学计算机工程与科学学院,上海200444 [2]上海大学材料基因组工程研究院材料信息与数据科学中心,上海200444 [3]之江实验室,浙江杭州311100

出　　处：《上海大学学报（自然科学版）》2022年第3期372-385,共14页Journal of Shanghai University:Natural Science Edition

基　　金：国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202102AB080019-3,202002AB080001-2);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)。

摘　　要：材料领域的文献中蕴含着丰富的知识,利用机器学习和自然语言处理等手段对文献进行数据挖掘是研究热点.命名实体识别(named entity recognition,NER)是高效利用挖掘和抽取数据中信息的首要步骤.为了解决现有实体识别方法中存在的向量表示无法解决一词多义、模型常提取上下文特征而忽略全局特征等问题,提出了一种基于上下文词汇匹配和图卷积命名实体识别方法.该方法首先利用XLNet获取文本的上下文动态特征,其次利用长短期记忆网络并结合文本上下文匹配词汇的图卷积神经网络(graph convolutional network,GCN)模型分别获取上下文特征与全局特征,最终经过条件随机场输出标签序列.2种不同语料对模型进行验证的结果表明,该方法在材料数据集上的精确率、召回率和F1值分别达到90.05%、88.67%和89.36%,可有效提升命名实体识别的准确率.Literature pertaining to materials contain abundant information regarding data mining using machine learning and natural language processing,which is currently being investigated extensively.Named entity recognition(NER)is first performed when mining and extracting information from data such that the data can be used efficiently.As vector representation cannot solve multiple meanings of words,and models often extract contextual features while disregarding global features,a named entity recognition method based on matching contextual lexical words and graph convolution is proposed herein.First,the contextual dynamic features of text is obtained using XLNet;second,the contextual and global features are obtained using a long short-term memory network and a graph convolutional network(GCN)combined with contextual lexical words of the text,respectively.Finally,a sequence of labels is output via a conditional random field.The model is validated using two different datasets.Experimental results of the material data show that the precision,recall,and F1 score are 90.05%,88.67%,and 89.36%,respectively,which effectively improve the named entity recognition accuracy.

关键词：命名实体识别 XLNet 图卷积神经网络

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合上下文词汇匹配和图卷积的材料数据命名实体识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合上下文词汇匹配和图卷积的材料数据命名实体识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索