基于综合相似度的短文本匹配算法研究  被引量:3

Research on Short Text Matching Algorithm Based on Comprehensive Similarity

在线阅读下载全文

作  者:陈乐 王超群 邹全 王丹 朱喜楠 CHEN Le;WANG Chaoqun;ZOU Quan;WANG Dan;ZHU Xinan(Aerospace Smart Energy Research Institute;Shanghai Aerospace Energy Co.,Ltd,Shanghai 201201,China)

机构地区:[1]航天智慧能源研究院 [2]上海航天能源股份有限公司,上海201201

出  处:《软件导刊》2023年第7期71-78,共8页Software Guide

摘  要:针对基于词袋模型的传统短文本匹配算法存在特征词空间高维稀疏,且相较长文本而言,上下文语义信息薄弱,使得特征词语义信息模糊,从而造成匹配精度较低等问题,提出融合语义相似度和概率相关度的短文本匹配算法。首先采用截断奇异值分解和余弦相似度计算短文本间的语义相似度;然后引入特征词语义维度的信息熵和标准差作为特征词的深层语义区分度,并采用特征词的深层语义区分度改进最佳匹配,从而使用改进最佳匹配计算短文本间的概率相关度;最后使用语义相似度和概率相关度的调和平均进行短文本匹配。实验表明,所提算法相比传统算法在匹配准确率方面提高了11.22%,F1分数提高了10.5%。The traditional short text matching algorithm based on bag-of-words model has high-dimensional sparseness in feature word space.Compared with long texts,the contextual semantic information of short texts is weak,which makes the semantic information of feature words ambiguous.Problems such as low matching accuracy are reflected.First,Semantic similarity between short texts is calculated by truncated sin⁃gular value decomposition and cosine similarity.Secondly,the deep semantic discrimination of feature words is introduced by the information entropy and standard deviation of the semantic dimension of feature words.The best match is improved by deep semantic discrimination of fea⁃ture words.The probabilistic relevance between short texts is calculated by the improved best match.Finally,short text matching is performed using the harmonic mean of semantic similarity and probabilistic relatedness.Experiments show that the algorithm proposed in this paper im⁃proves the matching accuracy by 11.22%and the F1 score by 10.5%compared with the traditional algorithm.

关 键 词:TF-IDF 最佳匹配 潜在语义分析 截断奇异值分解 余弦相似度 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象