检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡书林 张华军[1] 邓小涛 王征华 HU Shulin;ZHANG Huajun;DENG Xiaotao;WANG Zhenghua(School of Automation,Wuhan University of Technology,Wuhan 430070,Hubei,China;Wuhan DaSoundGen Technologies Co.,Ltd.,Wuhan 430075,Hubei,China)
机构地区:[1]武汉理工大学自动化学院,湖北武汉430070 [2]武汉大晟极科技有限公司,湖北武汉430075
出 处:《计算机工程》2025年第3期76-85,共10页Computer Engineering
基 金:湖北省重点研发计划项目(2022BAA051)。
摘 要:目前中文文本相似度计算能够通过词嵌入技术在语义层面判别文本相似度,但通常忽略了文本中蕴含的丰富的句法结构信息,而以词为单位的中文句法分析与动态词嵌入模型中以字为单位的分词粒度不一致,使得当前大多数结合句法分析的研究只能使用静态词嵌入来表征词的语义向量。针对此问题,根据依存句法分析构建依存图,通过分词掩码映射与注意力混合池化的方法实现动态词嵌入表征词节点的语义特征,然后使用图卷积网络提取依存图中词节点之间的依存关系信息,最终读出依存图,将其作为句子的特征向量,从语义与句法2个层面计算句子间的相似度。在表示型与交互型2种结构模型上应用所提方法,并在BQ_Corpus与ATEC数据集上进行实验,结果显示,该模型的准确率最高分别达到87.12%与88.33%,结合依存句法信息后模型的各项评估指标均有提升。In the current landscape of Chinese text similarity computation,the use of word-embedding techniques enables discrimination of text similarity at the semantic level.However,this approach often overlooks the rich syntactic structural information inherent in texts.Chinese syntactic analysis at the word level is inconsistent with the granularity of the dynamic word-embedding models that operate at the character level.Consequently,most studies that combine syntactic analysis employ only static word embeddings to represent the semantic vectors of words.To address this issue,this study constructs a dependency graph based on syntactic dependency analysis.It employs a method involving tokenization mask mapping and attention-mix pooling to embed the semantic features of word nodes dynamically.Subsequently,a graph convolutional network is employed to extract the dependency relationship information among the word nodes in the dependency graph.The resulting dependency graph is then utilized as a feature vector for the sentence.The similarity between sentences is calculated from both semantic and syntactic perspectives.The proposed approach is applied to two model structures based on representation and interaction.Experimental evaluations are conducted using the BQ_Corpus and ATEC datasets.The experimental results demonstrate that the model achieves the highest accuracies of 87.12%and 88.33%,respectively.The incorporation of syntactic information leads to improvements in various model performance metrics.
关 键 词:图卷积神经网络 依存句法分析 动态词嵌入 文本相似度 注意力机制
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49