检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:施寒瑜 曲维光[1,2] 魏庭新 周俊生[1] 顾彦慧 Shi Hanyu;Qu Weiguang;Wei Tingxin;Zhou Junsheng;Gu Yanhu(School of Computer and Electronic Information/School of Artificial Intelligence,Nanjing Normal University,Nanjing 210023,China;School of Chinese Language and Literature,Nanjing Normal University,Nanjing 210097,China;International College for Chinese Studies,Nanjing Normal University,Nanjing 210097,China)
机构地区:[1]南京师范大学计算机与电子信息学院/人工智能学院,江苏南京210023 [2]南京师范大学文学院,江苏南京210097 [3]南京师范大学国际文化教育学院,江苏南京210097
出 处:《南京师大学报(自然科学版)》2022年第1期127-135,共9页Journal of Nanjing Normal University(Natural Science Edition)
基 金:国家自然科学基金项目(61772278、61472191);国家社科基金项目(21&ZD288、18BYY127).
摘 要:数量名短语的识别是识别由数量短语修饰的名词短语左右边界的研究.以往研究中,基于统计学习模型的数量短语识别方法依赖人工特征,需要通过专家知识构建知识库来实现对“数词+量词”短语的识别.本文在以往研究基础上纳入“名词”形成“数词+量词+名词”等八类数量名短语,并采用深度学习方法解决这一边界识别任务.通过BERT模型对原始文本进行上下文特征表示,利用Lattice LSTM模型字词结合的思想将标准分词作为软特征融入文本字符级的特征表示中,最后通过CRF全局约束识别数量名短语边界.实验结果表明,本文方法在AMR语料上达到较优结果,精确率、召回率、F1值分别为80.83%,89.78%,85.07%.The research on recognition of quantity noun phrases is the identity of the left and right boundaries of quantity noun phrases.In previous studies,this task focuses on the recognition of quantity phrase and relies on artifical features which are constructed by experts based on statistical learning models.In this paper,we aim at the recognition of quantity noun phrases which have 8 subtypes and propose a neural network model to address the issue.Firstly,BERT is used to represent the contextual features of the original text.Then,the standard word segmentation is incorporated into the feature representation of the text character level as a soft feature by using the idea of Lattice LSTM model.Finally,the left and right boundaries of the“quantity noun phrase”are identified by the CRF global constraint.The experimental results show that this method achieves the better results and the precision,recall and F1 value reaches 80.83%,89.78%,85.07%respectively in the corpus of CAMR.
关 键 词:数量名短语识别 BERT Lattice LSTM CRF
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.63