基于动态主题模型融合多维数据的微博社区发现算法  被引量:25

Microblog Community Discovery Algorithm Based on Dynamic Topic Model with Multidimensional Data Fusion

在线阅读下载全文

作  者:刘冰玉[1] 王翠荣[1] 王聪[1] 王军伟[1] 王兴伟[2] 黄敏[1] 

机构地区:[1]东北大学计算机科学与工程学院,辽宁沈阳110819 [2]东北大学软件学院,辽宁沈阳110819

出  处:《软件学报》2017年第2期246-261,共16页Journal of Software

基  金:国家杰出青年科学基金(61225012;71325002);国家自然科学基金(61572123;61300195);高等学校博士学科点专项科研基金(20120042130003);辽宁省百千万人才工程项目(2013921068);河北省自然科学基金(F2014501078);河北省科技计划(15210146)~~

摘  要:随着微博用户的不断增加,微博网络已成为用户进行信息交流的平台.针对由于博文长度受限,传统的社区发现算法无法有效解决微博网络的稀疏性等问题,提出了DC-DTM(discovery community by dynamic topic model)算法.DC-DTM算法首先将微博网络映射为有向加权网络,网络中边的方向反映节点之间的关注关系,利用所提出的DTM(dynamic topic model)计算出节点之间的语义相似度,并将其作为节点间连边的权重.DTM是一种微博主题模型.该模型不仅能够挖掘博客的主题分布,而且能够计算出某一主题中用户的影响力大小.其次,利用所提出的复杂度较低的标签传播算法WLPA(weighted lebel propagation)进行微博网络的社区发现.该算法的初始化阶段将影响力大的用户节点作为初始节点,标签按照节点的影响力从大到小进行传播,避免了传统标签传播算法逆流现象的发生,提高了标签传播算法的稳定性.真实数据上的实验结果表明,DTM模型能够很好地对微博进行主题挖掘,DC-DTM算法能够有效地挖掘出微博网络的社区.With the dramatic increase of microblog users, microblog websites have become the platform for a wide spectrum of users to get information. Due to the fact that blog is a special type of text with restricted length, traditional community detection algorithms cannot effectively solve the sparse problem of micro blog. To address the issue, the DC-DTM(discovery community by dynamic topic model) algorithm is proposed in this paper. First, the algorithm maps microblog as a directed-weighted network, in which the direction is the concerned relationship, and the weight is the topic's similarity of different nodes calculated by DTM(dynamic topic model). DTM is a microblog topic model which can not only mine the topics of each microblog accurately but also calculate author's influence a topic. Second, the algorithm uses label propagation WLPA(weighted lebel propagation), with low complexity, to find communities in microblog. The initial process selects nodes with the largest influence as the initial nodes, and propagates the label in the order of node's influences, from large to small. The algorithm overcomes the adverse phenomenon in the traditional label propagation algorithm, and has better stability. Experiments on real data show that the DTM model can be very good for the topic mining in microblog and DC-DTM algorithm can effectively discover the communities of microblog.

关 键 词:新浪微博 文本挖掘 DC-DTM 吉布斯采样 LDA 主题模型 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象