融合HowNet和BTM模型的短文本聚类方法  被引量:9

Short-text clustering method combining HowNet with BTM model

在线阅读下载全文

作  者:阳小兰[1] 杨威[2] 钱程[1] 朱福喜[1,2] YANG Xiao-lan YANG Wei QIAN Cheng ZHU Fu-xi(School of Information and Engineering, Wuchang University of Technology, Wuhan 430223, China Computer School, Wuhan University, Wuhan 430072 China)

机构地区:[1]武昌理工学院信息工程学院,湖北武汉430223 [2]武汉大学计算机学院,湖北武汉430072

出  处:《计算机工程与设计》2017年第5期1258-1263,共6页Computer Engineering and Design

基  金:湖北省自然科学基金项目(2014CFB356)

摘  要:针对常规技术对短文本聚类时出现的相似度计算准确度较差、聚类结果不稳定等问题,提出一种以HowNet语义词库和BTM主题建模为基础的相似度计算方法,将两者进行线性组合,综合考察短文本的相似性。建立基于聚类质量和聚类差异度的聚类结果评价指标,进行优劣评价,过滤出质量较好的结果,利用CSPA融合算法进行聚类融合。实验结果表明,该方法提高了短文本相似度计算的准确性,改进了融合结果稳定性。Using conventional techniques in the short-text clustering, the similarity calculation accuracy is poor, and the clustering result is unstable. A similarity calculation method based on HowNet semantics thesaurus and BTM topic model was proposed Both of them were linearly combined and short text similarity was comprehensively studied Evaluation based on the clustering quality and clustering degree of difference was established, it was used to evaluate advantages and disadvantages. Better quality results were filtered out, clus-tering integration was realized using CSPA fusion algorithm. Experimental results show that the proposed method improves the accuracy of the calculation of short-text similarity, and improves the stability of the fusion results.

关 键 词:短文本 知网 Biterm主题模型 聚类 融合 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象