基于BTM的微博舆情热点发现  被引量:27

Hotspot Detection in Microblog Public Opinion Based on Biterm Topic Model

在线阅读下载全文

作  者:王亚民[1] 胡悦[1] 

机构地区:[1]西安电子科技大学经济与管理学院,西安710071

出  处:《情报杂志》2016年第11期119-124,140,共7页Journal of Intelligence

摘  要:[目的/意义]作为一种新兴的社交新闻媒体,近年来,微博在许多热点事件的发布和传播中发挥了重要作用。但由于其文本的特殊性,传统方法不能有效地对其进行建模发现热点话题。因此,如何高效、准确地从微博数据中发现并提取有意义的热点信息是一个很有价值的研究课题。[方法/过程]提出一种基于BTM模型的微博舆情热点发现方法。首先,对微博文本采用BTM建模,改进TF-IDF权重计算算法,以适应微博短文本的特征。并将BTM建模结果与改进的TF-IDF权重算法结合对微博文本进行特征提取及相似性度量,然后采用K-means聚类方法发现热点话题。[结果/结论]通过对新浪微博数据集的对比实验及结果分析验证了本方法的有效性。本方法能够有效解决传统模型在文本建模中所面临的高维度和稀疏性问题,显著改善热点话题的发现质量。[ Purpose/SignifiCance]As an emerging social news media, microblog has been playing a significant role in the distribution and transmission of many hotspot events in recent years. Due to the particularity of texts in microblog, the traditional methods cannot be utilized to model and find hotspot topics effectively. Therefore, it has become a meaningful research topic for academia to find and extract valuable hotspot information from microblog data efficiently and accurately. [ Method/Process] This paper proposes a new approach that detects microblog public opinion on the basis of BTM. Firstly, to adapt to the characteristics of short texts in microblog, BTM is used in microblog texts modeling, and the TF-IDF weight calculating algorithm is improved. The combination of the improved TF-IDF weight calculating algorithm with the BTM modeling result is used to express microblog texts and measure similarity. Then, hot topics are discovered through the K-means clustering method. [ Result/Conclusion ] The result of comparative experimental analysis on Sina Weibo data shows the effectiveness of the method suggested in this paper. The method can solve problems of higher dimension and sparse property in text modeling of traditional model, and improve the quality of discovering hot topics.

关 键 词:词对主题模型 短文本 微博舆情 相似性度量 

分 类 号:G350[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象