检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:程玉胜[1,2] 梁辉[2] 王一宾[1,2] 任勇[2]
机构地区:[1]安庆师范学院计算机与信息学院,安徽安庆246011 [2]安庆师范学院统计学研究所,安徽安庆246011
出 处:《计算机工程与应用》2016年第8期70-73,124,共5页Computer Engineering and Applications
基 金:安徽省高校省级自然科学研究项目(No.KJ2013A177);安徽省自然科学基金(No.10040606Q42)
摘 要:为了解决基于传统向量空间模型的文本相似性算法没有考虑向量高维及关键词的微变,而导致文本相似性计算结果不够精确的问题,提出了关键词微变情况下基于聚类和LD算法的文本相似性算法TSABCLDA(Text Similarity Algorithm Based on Clustering and LD Algorithm)。对文本进行移除数字、标点符号和停用词等预处理;采用聚类的方法约简文本中的低频词,利用LD算法计算特征词间的相似度,建立文本相似度矩阵;用特征词相似度及其权重构建的空间向量计算文本间的相似度,这样不仅考虑了关键词微变的情况,而且有效地解决了文本向量的高维问题,将其应用于文本挖掘中,能够提高相似文本的挖掘效率。实验结果表明,由于考虑了关键词微变情况,在一定的阈值范围内,该算法文本相似性的准确率得到了明显的提高。In order to solve the problem of the imprecise calculation result of text similarity which comes from text similarity algorithm based on traditional vector space model, it doesn't consider vector dimension and micro variation of key word, proposes TSABCLDA(Text Similarity Algorithm Based on Clustering and LD Algorithm)with the situation of micro variation of key word. In the present work, it makes some pretreatment of removing the number, punctuation and stop word. It reduces the low-frequency words in the text with clustering method, calculates the similarity between characteristic words by LD algorithm, builds text similarity matrix. It calculates the similarity between texts by characteristic words similarity matrix and space vector which is built by weight. It not only considers the micro variation situation of key word, but also solves the high dimensional problems of text effectively. If applied to text mining, it will improve the efficiency of mining of similarity text. The experimental results show that precise of the algorithm is improved obviously with the discovery of similarity text in situation of micro variation and a certain range of threshold values.
关 键 词:聚类 LD算法 文本相似度矩阵 向量空间模型 文本相似性
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117