检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:阳小兰[1] 杨威[2] 钱程[1] 朱福喜[1,2] YANG Xiao-lan YANG Wei QIAN Cheng ZHU Fu-xi(School of Information and Engineering, Wuchang University of Technology, Wuhan 430223, China Computer School, Wuhan University, Wuhan 430072 China)
机构地区:[1]武昌理工学院信息工程学院,湖北武汉430223 [2]武汉大学计算机学院,湖北武汉430072
出 处:《计算机工程与设计》2017年第5期1258-1263,共6页Computer Engineering and Design
基 金:湖北省自然科学基金项目(2014CFB356)
摘 要:针对常规技术对短文本聚类时出现的相似度计算准确度较差、聚类结果不稳定等问题,提出一种以HowNet语义词库和BTM主题建模为基础的相似度计算方法,将两者进行线性组合,综合考察短文本的相似性。建立基于聚类质量和聚类差异度的聚类结果评价指标,进行优劣评价,过滤出质量较好的结果,利用CSPA融合算法进行聚类融合。实验结果表明,该方法提高了短文本相似度计算的准确性,改进了融合结果稳定性。Using conventional techniques in the short-text clustering, the similarity calculation accuracy is poor, and the clustering result is unstable. A similarity calculation method based on HowNet semantics thesaurus and BTM topic model was proposed Both of them were linearly combined and short text similarity was comprehensively studied Evaluation based on the clustering quality and clustering degree of difference was established, it was used to evaluate advantages and disadvantages. Better quality results were filtered out, clus-tering integration was realized using CSPA fusion algorithm. Experimental results show that the proposed method improves the accuracy of the calculation of short-text similarity, and improves the stability of the fusion results.
关 键 词:短文本 知网 Biterm主题模型 聚类 融合
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.79