结合依存图卷积的中文文本相似度计算研究被引量：1

Similarity Calculation for Chinese Text Based on Dependency Graph Convolution

作　　者：胡书林张华军[1] 邓小涛王征华 HU Shulin;ZHANG Huajun;DENG Xiaotao;WANG Zhenghua(School of Automation,Wuhan University of Technology,Wuhan 430070,Hubei,China;Wuhan DaSoundGen Technologies Co.,Ltd.,Wuhan 430075,Hubei,China)

机构地区：[1]武汉理工大学自动化学院,湖北武汉430070 [2]武汉大晟极科技有限公司,湖北武汉430075

出　　处：《计算机工程》2025年第3期76-85,共10页Computer Engineering

基　　金：湖北省重点研发计划项目(2022BAA051)。

摘　　要：目前中文文本相似度计算能够通过词嵌入技术在语义层面判别文本相似度,但通常忽略了文本中蕴含的丰富的句法结构信息,而以词为单位的中文句法分析与动态词嵌入模型中以字为单位的分词粒度不一致,使得当前大多数结合句法分析的研究只能使用静态词嵌入来表征词的语义向量。针对此问题,根据依存句法分析构建依存图,通过分词掩码映射与注意力混合池化的方法实现动态词嵌入表征词节点的语义特征,然后使用图卷积网络提取依存图中词节点之间的依存关系信息,最终读出依存图,将其作为句子的特征向量,从语义与句法2个层面计算句子间的相似度。在表示型与交互型2种结构模型上应用所提方法,并在BQ_Corpus与ATEC数据集上进行实验,结果显示,该模型的准确率最高分别达到87.12%与88.33%,结合依存句法信息后模型的各项评估指标均有提升。In the current landscape of Chinese text similarity computation,the use of word-embedding techniques enables discrimination of text similarity at the semantic level.However,this approach often overlooks the rich syntactic structural information inherent in texts.Chinese syntactic analysis at the word level is inconsistent with the granularity of the dynamic word-embedding models that operate at the character level.Consequently,most studies that combine syntactic analysis employ only static word embeddings to represent the semantic vectors of words.To address this issue,this study constructs a dependency graph based on syntactic dependency analysis.It employs a method involving tokenization mask mapping and attention-mix pooling to embed the semantic features of word nodes dynamically.Subsequently,a graph convolutional network is employed to extract the dependency relationship information among the word nodes in the dependency graph.The resulting dependency graph is then utilized as a feature vector for the sentence.The similarity between sentences is calculated from both semantic and syntactic perspectives.The proposed approach is applied to two model structures based on representation and interaction.Experimental evaluations are conducted using the BQ_Corpus and ATEC datasets.The experimental results demonstrate that the model achieves the highest accuracies of 87.12%and 88.33%,respectively.The incorporation of syntactic information leads to improvements in various model performance metrics.

关键词：图卷积神经网络依存句法分析动态词嵌入文本相似度注意力机制

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合依存图卷积的中文文本相似度计算研究被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合依存图卷积的中文文本相似度计算研究 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

结合依存图卷积的中文文本相似度计算研究被引量：1