基于主题聚簇评价的论坛热点话题挖掘  被引量:5

On-line forum hot topic mining method based on topic cluster evaluation

在线阅读下载全文

作  者:江浩[1] 陈兴蜀[1] 杜敏[1] 

机构地区:[1]四川大学计算机学院,成都610065

出  处:《计算机应用》2013年第11期3071-3075,共5页journal of Computer Applications

基  金:国家科技支撑计划课题项目(2012BAH18B05)

摘  要:热点话题挖掘是舆情监控的重要技术基础。针对现有的论坛热点话题挖掘方法没有解决数据中词汇噪声较多且热度评价方式单一的问题,提出一种基于主题聚簇评价的热点话题挖掘方法。采用潜在狄里克雷分配主题模型对论坛文本数据建模,对映射到主题空间的文档集去除主题噪声后用优化聚类中心选择的K-means++算法进行聚类,最后从主题突发度、主题纯净度和聚簇关注度三个方面对聚簇进行评价。通过实验分析得出主题噪声阈值设置为0.75,聚类中心数设置为50时,可以使聚类质量与聚类速度达到最优。真实数据集上的测试结果表明该方法可以有效地将聚簇按出现热点话题的可能性排序。最后设计了热点话题的展示方法。Hot topic mining is an important technical foundation for monitoring public opinion. As current hot topic mining methods cannot solve the affection of word noise and have single hot degree evaluation way, a new mining method based on topic cluster evaluation was proposed. After forum data was modeled by Latent Dirichlet Allocation (LDA) topic model and topic noise was cut off, the data were then clustered by improved cluster center selection algorithm K-means + +. Finally, clusters were evaluated in three aspects: abruptness, purity and attention degree of topics. The experimental results show that both cluster quality and clustering speed can rise up by setting topic noise threshold to 0.75 and cluster number to 50. The effectiveness of ranking clusters by their probability of the existing hot topic with this method has also been proved on real data sets tests. At last a method was developed for displaying hot topics.

关 键 词:潜在狄里克雷分配 主题模型 K—means++聚类 聚簇评价 热点话题 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象