基于速度增长的微博热点话题发现  被引量:17

Hot topics found on micro-blog based on speed growth

在线阅读下载全文

作  者:薛素芝[1,2] 鲁燃[1,2] 任圆圆[1,2] 

机构地区:[1]山东师范大学信息科学与工程学院,济南250014 [2]山东省分布式计算机软件新技术重点实验室,济南250014

出  处:《计算机应用研究》2013年第9期2598-2601,共4页Application Research of Computers

基  金:国家自然科学基金资助项目(60873247);山东省自然科学基金资助项目(ZR2009GZ007;ZR2011FM030);国家社科基金资助项目(12BXW040);公安部科技创新计划资助项目(2011YYCXSDST057)

摘  要:在微博热点话题发现中,由于微博文本短、词量少、用词不规范等特征,使得传统的热点话题检测方法力不从心。针对这一问题,提出了基于速度增长的微博热点话题发现方法。首先把经过预处理的微博按等数量窗口划分,统计每个窗口内各词语的词频,并表示成时间二元组序列;然后通过计算每相邻两个窗口的个词语的增长斜率来发现增长速度快的词语;再通过计算与该词语有关的用户的增长速度和微博条数的增长速度来确定该词语是否是热点主题词;最后通过热点主题词聚类产生热点话题。通过实验验证了该方法的可行性。实验结果表明,该方法在一定程度上提高了检测效率,降低了漏检率和误检率,可以有效地及时发现微博热点话题。In hot topics found on micro-blog, because the text of micro-blog is short and less words, and the terms are not standard, so the traditional hot topic detection method can not find hot topics effectively. In order to solve this problem, this paper presented a method of hot topics found based on speed growth. Firstly, it divided the pretreated micro-blogs on the basis of the equal number of window, and added up the term frequency in each window, and expressed as feature trajectory of binary group sequence. Secondly, it calculated the growth slope of every adjacent two windows to find the words with growth speed. Thirdly,it calculated the growth speed of the word' s relevant users and the growth speed of the word' s relevant micro-blogs to ensure the word was hot subject or not. Finally,it found hot topics through the hot subject clustering. The experimental proves the feasibility of the algorithm, results show that the method improves the efficiency of the detection to a certain extent, and re- duces the undetected rate and false detection rate, it can effectively discover hot topics on micro-blog in time.

关 键 词:增长斜率 增长速度 时间二元组序列 热点发现 

分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象