基于流形学习的句向量优化

Sentence embedding optimization based on manifold learning

作　　者：吴明月周栋[1] 赵文玉[1,2] 屈薇 WU Mingyue;ZHOU Dong;ZHAO Wenyu;QU Wei(School of Computer Science and Engineering,Hunan University of Science and Technology University,Xiangtan Hunan 411201,China;Hunan Key Laboratory for Service Computing and Novel Software Technology(Hunan University of Science and Technology University),Xiangtan Hunan 411201,China)

机构地区：[1]湖南科技大学计算机科学与工程学院,湖南湘潭411201 [2]服务计算与软件服务新技术湖南省重点实验室(湖南科技大学),湖南湘潭411201

出　　处：《计算机应用》2023年第10期3062-3069,共8页journal of Computer Applications

基　　金：国家自然科学基金资助项目(61876062);湖南省自然科学基金资助项目(2022JJ30020);湖南省教育厅科研项目(21A0319)。

摘　　要：句向量是自然语言处理的核心技术之一,影响着自然语言处理系统的质量和性能。然而,已有的方法无法高效推理句与句之间的全局语义关系,致使句子在欧氏空间中的语义相似性度量仍存在一定问题。为解决该问题,从句子的局部几何结构入手,提出一种基于流形学习的句向量优化方法。该方法利用局部线性嵌入(LLE)对句子及其语义相似句子进行两次加权局部线性组合,这样不仅保持了句子之间的局部几何信息,而且有助于推理全局几何信息,进而使句子在欧氏空间中的语义相似性更贴近人类真实语义。在7个文本语义相似度任务上的实验结果表明,所提方法的斯皮尔曼相关系数(SRCC)平均值相较于基于对比学习的方法SimCSE(Simple Contrastive learning of Sentence Embeddings)提升了1.21个百分点。此外,将所提方法运用于主流预训练模型上的结果表明,相较于原始预训练模型,所提方法优化后模型的SRCC平均值提升了3.32~7.70个百分点。As one of the core technologies of natural language processing,sentence embedding affects the quality and performance of natural language processing system.However,the existing methods are unable to infer the global semantic relationship between sentences efficiently,which leads to the fact that the semantic similarity measurement of sentences in Euclidean space still has some problems.To address the issue,a sentence embedding optimization method based on manifold learning was proposed.In the method,Local Linear Embedding(LLE)was used to perform double weighted local linear combinations to the sentences and their semantically similar sentences,thereby preserving the local geometric information between sentences and providing helps to the inference of the global geometric information.As a result,the semantic similarity of sentences in Euclidean space was closer to the real semantics of humans.Experimental results on seven text semantic similarity tasks show that the proposed method has the average Spearman’s Rank Correlation Coefficient,(SRCC)improved by 1.21 percentage points compared with the contrastive learning-based method SimCSE(Simple Contrastive learning of Sentence Embeddings).In addition,the proposed method was applied to mainstream pre-trained models.The results show that compared to the original pre-trained models,the models optimized by the proposed method have the average SRCC improved by 3.32 to 7.70 percentage points.

关键词：流形学习预训练模型对比学习句向量自然语言处理局部线性嵌入

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于流形学习的句向量优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于流形学习的句向量优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索