检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王庆人 王银子 仲红[1] 张以文[1] WANG Qingren;WANG Yinzi;ZHONG Hong;ZHANG Yiwen(College of Computer Science and Technology,Anhui University,Hefei 230093,China)
机构地区:[1]安徽大学计算机科学与技术学院,合肥230093
出 处:《清华大学学报(自然科学版)》2023年第9期1326-1338,共13页Journal of Tsinghua University(Science and Technology)
基 金:国家自然科学基金重点项目(U1936220);国家自然科学基金青年项目(62006003)。
摘 要:作为信息抽取的核心任务,命名实体识别能够从文本中识别不同类型命名实体。得益于深度学习在字词表示、特征提取方面的应用,中文命名实体识别任务取得了丰富研究成果。然而,中文命名实体识别任务依旧面临词汇信息缺乏的挑战,主要表现为:1)词汇边界信息和上下文语义信息未充分利用;2)字和自匹配词汇间语义信息未能有效捕获;3)图注意力网络输出信息中不同交互图信息的重要性未被考虑。该文提出一种面向中文的字词组合序列实体识别方法。采用字词组合序列嵌入结构,实现词汇边界信息以及字符与词汇间语义信息的充分捕捉;采用多图注意力融合架构,实现不同图神经网络提取特征重要性的区分。实验表明,相比已有经典方法,该方法在Weibo、Resume、OntoNotes4.0及MSRA四个数据集上的F1明显提升,在中文命名实体识别任务上具有可行性。[Objective]As the core task of information extraction,named entity recognition recognizes various types of named entities from the text.The task of Chinese-named entity recognition has benefited from the application of deep learning in character vocabulary representation,feature extraction,and other aspects,achieving rich results.However,this task still faces the challenge of a lack of vocabulary information,which has been regarded as one of the primary impediments to the development of a high-performance Chinese-named entity recognition(NER)system.While the automatically constructed dictionary contains rich lexical boundary information and lexical semantic information,the integration of word knowledge in the Chinese NER task still faces challenges,such as the effective integration of the semantic information of self-matching words and their context information into Chinese characters.Furthermore,although graph neural networks can be used to extract feature information from various Chinese character-vocabulary interaction diagrams in feature extraction,the challenge of how to fuse features based on the importance of the information from the respective interaction diagrams into the original input sequence is yet to be solved.[Methods]This paper proposes a Chinese-oriented entity recognition method of Chinese-vocabulary combination sequence.(1)First,this method proposes a Chinese-vocabulary combination sequence embedding structure that primarily uses self-matching words to replace the Chinese characters in the Chinese character sequence under consideration.To make complete use of the self-matching vocabulary information,we also constructed a sequence for the self-matching vocabulary and vectorized the vocabulary and Chinese characters.At the coding level,we obtained the context information of the Chinese character sequence,the vocabulary sequence,and the Chinese-word combination sequence using the BiLSTM model and then fused the information from the words in the Chinese word combination sequence into the correspond
关 键 词:自然语言处理 命名实体识别 图注意力网络 字词组合嵌入 多图注意力
分 类 号:TP393.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.216.39