基于实时词共现网络的微博话题发现  被引量:5

Micro-blog hot-spot topic discovery based on real-time word co-occurrence network

在线阅读下载全文

作  者:李亚星[1] 王兆凯[1] 冯旭鹏[2] 刘利军[1] 黄青松[1,3] 

机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]昆明理工大学教育技术与网络中心,昆明650500 [3]云南省计算机技术应用重点实验室(昆明理工大学),昆明650500

出  处:《计算机应用》2016年第5期1302-1306,共5页journal of Computer Applications

基  金:国家自然科学基金资助项目(81360230)~~

摘  要:针对微博的实时性、稀疏性和海量性特点,提出基于实时词共现网络的话题发现模型。首先,从原始语料中筛选出主题词集合,再利用时间参数计算共现主题词的关系权重以实现词共现网络的构建,通过该网络推算出与话题关联性强的潜在特征词以解决微博特征词的稀疏性;其次,采用改进Single-Pass算法实现话题增量聚类;最后,对每个话题的主题词按热度计算进行排序,获得最具代表性的话题主题词。实验结果表明,该模型与经典Single-Pass聚类算法相比,话题发现准确率约提高6%,综合指标提高8%。实验结果证明所提模型的有效性和准确性。In view of the real-time,sparse and massive characteristics of micro-blog,a topic discovery model based on real-time co-occurrence network was proposed. Firstly,the set of keywords was extracted from the primitive data by the model,and the relationship weights was calculated on the basis of the time parameter to structure the word co-occurrence network.Then,sparsity could be reduced by finding potential features of a strong correlation based on weight adjustment coefficient.Secondly,the topic incremental clustering could be achieved by using the improved Single-Pass algorithm. Finally,the feature words of each topic were sorted by heat calculation,so the most representative keywords of the topic were got. The experimental results show that the accuracy and comprehensive index of the proposed model increase 6%,8% respectively compared with the Single-Pass algorithm. The experimental results prove the validity and accuracy of the proposed model.

关 键 词:话题发现 实时共现网络 短文本 Single-Pass聚类 热度计算 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象