MB-HL模型的微博主题挖掘研究  被引量:1

Study of topic mining for microblog based on MB-HL model

在线阅读下载全文

作  者:蒋权[1] 郑山红[1] 刘凯[1] 李万龙[1] Jiang Quan;Zheng Shanhong;Liu Kai;Li Wanlong(College of Computer Science&Engineering,Changchun University of Technology,Changchun 130012,China)

机构地区:[1]长春工业大学计算机科学与工程学院,长春130012

出  处:《计算机应用研究》2018年第11期3298-3301,3306,共5页Application Research of Computers

基  金:吉林省自然科学基金资助项目(20130101060JC);吉林省教育厅"十二五"科学技术研究基金资助项目(2014131;2014125)

摘  要:为了解决传统的文本主题模型对微博主题挖掘准确率低及不考虑主题之间关联的问题,针对中文微博语料本身的特点,分析LDA(latent Dirichlet allocation)和HMM(hidden Markov model)的优缺点,提出了微博主题挖掘模型MB-HL(microblog-HMM&LDA)。该模型用逐条微博作为处理单元,建立分布主题—词语矩阵并进行优化,通过LDA模型对微博用户不同的行为建模并提取特征,利用HMM模型强大的时序状态建模能力弥补LDA在主题相关性上的不足,采用Gibbs采样进行推理求解。在真实的新浪微博数据上对比实验表明MB-HL模型能提高近9%主题关键词的准确度,并能有效地发现主题之间的关联关系。In order to solve the problem of microblog theme mining in the lower accuracy and without considering to relation of between themes in the traditional text topic model,according to the characteristics of Chinese microblog corpus,analyzing the advantages and disadvantages of LDA(latent Dirichlet allocation)and HMM(hidden Markov model),this paper proposed a new model that microblog theme mining model abbreviated as MB-HL(microblog-HMM&LDA).It used detailed microblog as the processing unit,proposed and optimized the distributed topic-word matrix.The model adopted different microblog users behavior and extracted feature by LDA,utilized time state modeling ability of HMM to make up the lack of the strong correlation of the theme for LDA,and presented a Gibbs sampling implementation for inference of the proposed model.Experimental results on actual Sina microblog dateset show that MB-HL model can improve the topic key words of accuracy nearly 9%,and can effectively find the relationship between topics.

关 键 词:微博 主题挖掘 潜在狄利克雷分布模型 隐马尔可夫模型 MB-HL模型 GIBBS采样 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象