汉语V+V序列关系识别研究  

Research on Chinese V+V Sequence Relation Recognition

在线阅读下载全文

作  者:李胜男 曲维光[1,2] 魏庭新 周俊生[1] 顾彦慧 李斌[2] LI Shengnan;QU Weiguang;WEI Tingxin;ZHOU Junsheng;GU Yanhui;LI Bin(School of Computer and Electronic Information/School of Artificial Intelligence,Nanjing Normal University,Nanjing 210023,China;School of Chinese Language and Literature,Nanjing Normal University,Nanjing 210097,China;International College for Chinese Studies,Nanjing Normal University,Nanjing 210097,China)

机构地区:[1]南京师范大学计算机与电子信息学院/人工智能学院,南京210023 [2]南京师范大学文学院,南京210097 [3]南京师范大学国际文化教育学院,南京210097

出  处:《计算机工程与应用》2023年第5期289-296,共8页Computer Engineering and Applications

基  金:国家自然科学基金面上项目(61772278);江苏省高校哲学社会科学基金一般项目(2019JSA0220);国家社会科学基金面上项目(18BYY127)。

摘  要:“V+V”是现代汉语中的常见结构,能够形成兼语、连动等多种完全不同的句法结构,给句法和语义解析造成困难。针对“V+V”形成的句法结构类型和序列关系识别问题,设计并制定了一套语料库标注规范,以解决语料库中存在的“V+V”结构的嵌套标注问题,并据此构建起一个包含5 381个兼语句子、7 987个连动句子,以及1 212个兼语连动嵌套句子的“V+V”语料库。提出一个基于BiLSTM-CRF和多头注意力机制的模型,能够同时识别结构中的多个动词和名词的句法、语义角色。相比于以往只研究单项识别兼语或者连动结构,该模型不仅可以同时识别兼语结构、连动结构,还可以解决兼语连动嵌套结构的识别问题。实验结果表明:该方法能够很好地解决“V+V”序列关系的识别问题,在测试集语料上达到92.12%的F1值。“V+V”is one of the most common structures in modern Chinese. Due to the fact that noun and verb bear various semantic roles, many different types of grammatical structures such as serial verb structures and concurrent structures can be formed by“V+V”, which causes difficulties in syntactic and semantic parsing. To identify the syntactic types and sequential relations entailed in the structure, it firstly constructs a“V + V”corpus according to the designed nested structure annotation specification, which contains 5 381 concurrent sentences, 7 987 serial verb sentences and 1 212 concurrent serial verb nested sentences, then it proposes a model based on BiLSTM-CRF and multi-head attention to identity the structure’s grammatical type and the semantic types of its components. A unified framework is designed to identify the concurrent structures and serial verb structures. Besides, it can identify the nested structures which has not been addressed in previous works. The experimental results on the constructed corpus show that the proposed model can achieve better performance and the F1 value reaches 92.12%.

关 键 词:V+V序列关系 连动结构 兼语结构 中文抽象语义表示 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象