检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈茜 武星[1,2,3] CHEN Qian;WU Xing(School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China;Center of Materials Informatics and Data Science,Materials Genome Institute,Shanghai University,Shanghai 200444,China;Zhejiang Laboratory,Hangzhou 311100,Zhejiang,China)
机构地区:[1]上海大学计算机工程与科学学院,上海200444 [2]上海大学材料基因组工程研究院材料信息与数据科学中心,上海200444 [3]之江实验室,浙江杭州311100
出 处:《上海大学学报(自然科学版)》2022年第3期372-385,共14页Journal of Shanghai University:Natural Science Edition
基 金:国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202102AB080019-3,202002AB080001-2);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)。
摘 要:材料领域的文献中蕴含着丰富的知识,利用机器学习和自然语言处理等手段对文献进行数据挖掘是研究热点.命名实体识别(named entity recognition,NER)是高效利用挖掘和抽取数据中信息的首要步骤.为了解决现有实体识别方法中存在的向量表示无法解决一词多义、模型常提取上下文特征而忽略全局特征等问题,提出了一种基于上下文词汇匹配和图卷积命名实体识别方法.该方法首先利用XLNet获取文本的上下文动态特征,其次利用长短期记忆网络并结合文本上下文匹配词汇的图卷积神经网络(graph convolutional network,GCN)模型分别获取上下文特征与全局特征,最终经过条件随机场输出标签序列.2种不同语料对模型进行验证的结果表明,该方法在材料数据集上的精确率、召回率和F1值分别达到90.05%、88.67%和89.36%,可有效提升命名实体识别的准确率.Literature pertaining to materials contain abundant information regarding data mining using machine learning and natural language processing,which is currently being investigated extensively.Named entity recognition(NER)is first performed when mining and extracting information from data such that the data can be used efficiently.As vector representation cannot solve multiple meanings of words,and models often extract contextual features while disregarding global features,a named entity recognition method based on matching contextual lexical words and graph convolution is proposed herein.First,the contextual dynamic features of text is obtained using XLNet;second,the contextual and global features are obtained using a long short-term memory network and a graph convolutional network(GCN)combined with contextual lexical words of the text,respectively.Finally,a sequence of labels is output via a conditional random field.The model is validated using two different datasets.Experimental results of the material data show that the precision,recall,and F1 score are 90.05%,88.67%,and 89.36%,respectively,which effectively improve the named entity recognition accuracy.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30