检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈乐 王超群 邹全 王丹 朱喜楠 CHEN Le;WANG Chaoqun;ZOU Quan;WANG Dan;ZHU Xinan(Aerospace Smart Energy Research Institute;Shanghai Aerospace Energy Co.,Ltd,Shanghai 201201,China)
机构地区:[1]航天智慧能源研究院 [2]上海航天能源股份有限公司,上海201201
出 处:《软件导刊》2023年第7期71-78,共8页Software Guide
摘 要:针对基于词袋模型的传统短文本匹配算法存在特征词空间高维稀疏,且相较长文本而言,上下文语义信息薄弱,使得特征词语义信息模糊,从而造成匹配精度较低等问题,提出融合语义相似度和概率相关度的短文本匹配算法。首先采用截断奇异值分解和余弦相似度计算短文本间的语义相似度;然后引入特征词语义维度的信息熵和标准差作为特征词的深层语义区分度,并采用特征词的深层语义区分度改进最佳匹配,从而使用改进最佳匹配计算短文本间的概率相关度;最后使用语义相似度和概率相关度的调和平均进行短文本匹配。实验表明,所提算法相比传统算法在匹配准确率方面提高了11.22%,F1分数提高了10.5%。The traditional short text matching algorithm based on bag-of-words model has high-dimensional sparseness in feature word space.Compared with long texts,the contextual semantic information of short texts is weak,which makes the semantic information of feature words ambiguous.Problems such as low matching accuracy are reflected.First,Semantic similarity between short texts is calculated by truncated sin⁃gular value decomposition and cosine similarity.Secondly,the deep semantic discrimination of feature words is introduced by the information entropy and standard deviation of the semantic dimension of feature words.The best match is improved by deep semantic discrimination of fea⁃ture words.The probabilistic relevance between short texts is calculated by the improved best match.Finally,short text matching is performed using the harmonic mean of semantic similarity and probabilistic relatedness.Experiments show that the algorithm proposed in this paper im⁃proves the matching accuracy by 11.22%and the F1 score by 10.5%compared with the traditional algorithm.
关 键 词:TF-IDF 最佳匹配 潜在语义分析 截断奇异值分解 余弦相似度
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.113