基于组合深度模型的现代汉语数量名短语识别  被引量:2

Quantity Noun Phrase Structure Recognition Based on Combined Deep Learning Model

在线阅读下载全文

作  者:施寒瑜 曲维光[1,2] 魏庭新 周俊生[1] 顾彦慧 Shi Hanyu;Qu Weiguang;Wei Tingxin;Zhou Junsheng;Gu Yanhu(School of Computer and Electronic Information/School of Artificial Intelligence,Nanjing Normal University,Nanjing 210023,China;School of Chinese Language and Literature,Nanjing Normal University,Nanjing 210097,China;International College for Chinese Studies,Nanjing Normal University,Nanjing 210097,China)

机构地区:[1]南京师范大学计算机与电子信息学院/人工智能学院,江苏南京210023 [2]南京师范大学文学院,江苏南京210097 [3]南京师范大学国际文化教育学院,江苏南京210097

出  处:《南京师大学报(自然科学版)》2022年第1期127-135,共9页Journal of Nanjing Normal University(Natural Science Edition)

基  金:国家自然科学基金项目(61772278、61472191);国家社科基金项目(21&ZD288、18BYY127).

摘  要:数量名短语的识别是识别由数量短语修饰的名词短语左右边界的研究.以往研究中,基于统计学习模型的数量短语识别方法依赖人工特征,需要通过专家知识构建知识库来实现对“数词+量词”短语的识别.本文在以往研究基础上纳入“名词”形成“数词+量词+名词”等八类数量名短语,并采用深度学习方法解决这一边界识别任务.通过BERT模型对原始文本进行上下文特征表示,利用Lattice LSTM模型字词结合的思想将标准分词作为软特征融入文本字符级的特征表示中,最后通过CRF全局约束识别数量名短语边界.实验结果表明,本文方法在AMR语料上达到较优结果,精确率、召回率、F1值分别为80.83%,89.78%,85.07%.The research on recognition of quantity noun phrases is the identity of the left and right boundaries of quantity noun phrases.In previous studies,this task focuses on the recognition of quantity phrase and relies on artifical features which are constructed by experts based on statistical learning models.In this paper,we aim at the recognition of quantity noun phrases which have 8 subtypes and propose a neural network model to address the issue.Firstly,BERT is used to represent the contextual features of the original text.Then,the standard word segmentation is incorporated into the feature representation of the text character level as a soft feature by using the idea of Lattice LSTM model.Finally,the left and right boundaries of the“quantity noun phrase”are identified by the CRF global constraint.The experimental results show that this method achieves the better results and the precision,recall and F1 value reaches 80.83%,89.78%,85.07%respectively in the corpus of CAMR.

关 键 词:数量名短语识别 BERT Lattice LSTM CRF 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象