基于句法依存卷积神经网络的句子相似度计算被引量：2

Sentence similarity computation based on syntactic dependency convolutional neural network

作　　者：铉静吴琼魏从悦伍星[1] XUAN Jing;WU Qiong;WEI Congyue;WU Xing(College of Computer Science,Chongqing University,Chongqing 400044,P.R.China;College of Management Science and Engineering,Chongqing Technology and Business University,Chongqing 400067,P.R.China)

机构地区：[1]重庆大学计算机学院,重庆400044 [2]重庆工商大学管理科学与工程学院,重庆400067

出　　处：《重庆大学学报（自然科学版）》2020年第9期41-53,共13页Journal of Chongqing University

基　　金：重庆工商大学开放基金项目资助(KFJJ2019056,KFJJ2019050);重庆工商大学杰、优博士人才计划资助项目(2056001);重庆工商大学数据与信息管理方向学科建设资助项目(ZDPTTD201917)。

摘　　要：句子相似度计算是自然语言处理的一项基础任务,其准确性直接影响机器翻译、问题回答等下游任务的性能。传统机器学习方法主要依靠词形、词序及结构等浅层特征计算句子相似度,而深度学习方法能够融入深层语义特征,从而取得了更好效果。深度学习方法如卷积神经网络在提取文本特征时存在提取句子语义特征较浅、长距离依赖信息不足的缺点。因此设计了DCNN(dependency convolutional neural network)模型,该模型利用词语之间的依存关系来解决该不足。DCNN模型首先通过依存句法分析得到句子中词语之间的依存关系,而后根据与当前词存在一跳或者两跳关系的词语形成二元和三元的词语组合,再将这两部分信息作为原句信息的补充,输入到卷积神经网络中,以此来获取词语之间长距离依赖信息。实验结果表明,加入依存句法信息得到的长距离依赖能有效提升模型性能。在MSRP(microsoft research paraphrase corpus)数据集上,模型准确度和F1值分别为80.33%和85.91,在SICK(sentences involving compositional knowledge)数据集上模型的皮尔森相关系数能达到87.5,在MSRvid(microsoft video paraphrase corpus)数据集上模型的皮尔森相关系数能达到92.2。Sentence similarity computation is a basic task of many natural language processing, and its accuracy has a direct impact on the performance of language related systems, especially in machine translation, plagiarism detection, query ranking and question answering. Compared with the traditional methods that rely on shallow features like morphology, word sequence and grammar structure for sentence similarity computation, deep learning methods can integrate the deep semantic features and achieve better results. However, deep learning methods using convolutional neural networks needs to overcome defects such as narrow receptive field and insufficient long-distance information dependence when extracting text features. In this paper, a DCNN(dependency convolutional neural network) model was established to carry out dependency-based syntactic analysis for information retrieval over longer distance. We made text parsing, employing Stanford NLP for syntactic analysis, and then retrieved mutual relationship between two words in a binary combination or triplet. As lexical supplement information embedded in these word combinations, the dependency information, in addition to that of the original sentence, was added up as Convolutional Neural Network input, thus constructing a Dependency CNN. Experiment results reveal that the long distance dependency information effectively improve the similarity computation performance in our proposed dependency model on MSRP(Microsoft research paraphrase corpus) dataset, and the accuracy and F1 value are 80.33% and 85.91 respectively. The Pearson correlation coefficient of the model reaches 87.5 on SICK(Sentences invloving compositional knowledge) dataset and 92.2 on MSRvid(Microsoft videl paraphrase corpus) dataset.

关键词：句子相似度依存句法树长距离依赖

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于句法依存卷积神经网络的句子相似度计算被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于句法依存卷积神经网络的句子相似度计算 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于句法依存卷积神经网络的句子相似度计算被引量：2