机构地区:[1]School of Computer Science, National University of Defense Technology, Changsha 410073, China [2]College of Information and Management, National University of Defense Technology, Changsha 410073, China
出 处:《China Communications》2014年第8期131-144,共14页中国通信(英文版)
基 金:This work was supported by the National High Technology Research and Development Program of China(No. 2010AA012505, 2011AA010702, 2012AA01A401 and 2012AA01A402), Chinese National Science Foundation (No. 60933005, 91124002,61303265), National Technology Support Foundation (No. 2012BAH38B04) and National 242 Foundation (No. 2011A010)
摘 要:Microblogs have become an important platform for people to publish,transform information and acquire knowledge.This paper focuses on the problem of discovering user interest in microblogs.In this paper,we propose a topic mining model based on Latent Dirichlet Allocation(LDA) named user-topic model.For each user,the interests are divided into two parts by different ways to generate the microblogs:original interest and retweet interest.We represent a Gibbs sampling implementation for inference the parameters of our model,and discover not only user's original interest,but also retweet interest.Then we combine original interest and retweet interest to compute interest words for users.Experiments on a dataset of Sina microblogs demonstrate that our model is able to discover user interest effectively and outperforms existing topic models in this task.And we find that original interest and retweet interest are similar and the topics of interest contain user labels.The interest words discovered by our model reflect user labels,but range is much broader.Microblogs have become an important platform for people to publish, transform information and acquire knowledge. This paper focuses on the problem of discovering user interest in microblogs. In this paper, we propose a topic mining model based on Latent Dirichlet Allocation (LDA) named user-topic model. For each user, the interests are divided into two parts by different ways to generate the microblogs: original interest and retweet interest. We represent a Gibbs sampling implementation for inference the parameters of our model, and discover not only user's original interest, but also retweet interest. Then we combine original interest and retweet interest to compute interest words for users. Experiments on a dataset of Sina microblogs demonstrate that our model is able to discover user interest effectively and outperforms existing topic models in this task. And we find that original interest and retweet interest are similar and the topics of interest contain user labels. The interest words discovered by our model reflect user labels, but range is much broader.
关 键 词:MICROBLOGS topic mining userinterest LDA user-topic model
分 类 号:TP393.4[自动化与计算机技术—计算机应用技术] TP391.72[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...