检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴明月 周栋[1] 赵文玉[1,2] 屈薇 WU Mingyue;ZHOU Dong;ZHAO Wenyu;QU Wei(School of Computer Science and Engineering,Hunan University of Science and Technology University,Xiangtan Hunan 411201,China;Hunan Key Laboratory for Service Computing and Novel Software Technology(Hunan University of Science and Technology University),Xiangtan Hunan 411201,China)
机构地区:[1]湖南科技大学计算机科学与工程学院,湖南湘潭411201 [2]服务计算与软件服务新技术湖南省重点实验室(湖南科技大学),湖南湘潭411201
出 处:《计算机应用》2023年第10期3062-3069,共8页journal of Computer Applications
基 金:国家自然科学基金资助项目(61876062);湖南省自然科学基金资助项目(2022JJ30020);湖南省教育厅科研项目(21A0319)。
摘 要:句向量是自然语言处理的核心技术之一,影响着自然语言处理系统的质量和性能。然而,已有的方法无法高效推理句与句之间的全局语义关系,致使句子在欧氏空间中的语义相似性度量仍存在一定问题。为解决该问题,从句子的局部几何结构入手,提出一种基于流形学习的句向量优化方法。该方法利用局部线性嵌入(LLE)对句子及其语义相似句子进行两次加权局部线性组合,这样不仅保持了句子之间的局部几何信息,而且有助于推理全局几何信息,进而使句子在欧氏空间中的语义相似性更贴近人类真实语义。在7个文本语义相似度任务上的实验结果表明,所提方法的斯皮尔曼相关系数(SRCC)平均值相较于基于对比学习的方法SimCSE(Simple Contrastive learning of Sentence Embeddings)提升了1.21个百分点。此外,将所提方法运用于主流预训练模型上的结果表明,相较于原始预训练模型,所提方法优化后模型的SRCC平均值提升了3.32~7.70个百分点。As one of the core technologies of natural language processing,sentence embedding affects the quality and performance of natural language processing system.However,the existing methods are unable to infer the global semantic relationship between sentences efficiently,which leads to the fact that the semantic similarity measurement of sentences in Euclidean space still has some problems.To address the issue,a sentence embedding optimization method based on manifold learning was proposed.In the method,Local Linear Embedding(LLE)was used to perform double weighted local linear combinations to the sentences and their semantically similar sentences,thereby preserving the local geometric information between sentences and providing helps to the inference of the global geometric information.As a result,the semantic similarity of sentences in Euclidean space was closer to the real semantics of humans.Experimental results on seven text semantic similarity tasks show that the proposed method has the average Spearman’s Rank Correlation Coefficient,(SRCC)improved by 1.21 percentage points compared with the contrastive learning-based method SimCSE(Simple Contrastive learning of Sentence Embeddings).In addition,the proposed method was applied to mainstream pre-trained models.The results show that compared to the original pre-trained models,the models optimized by the proposed method have the average SRCC improved by 3.32 to 7.70 percentage points.
关 键 词:流形学习 预训练模型 对比学习 句向量 自然语言处理 局部线性嵌入
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222