基于词共现关系和粗糙集的微博话题检测方法  被引量:1

News Topic Detection on Chinese Microblog Based on Rough Set and Word Co-Occurrence

在线阅读下载全文

作  者:兰天[1,2] 郭躬德[1,2] 

机构地区:[1]福建师范大学数学与计算机科学学院,福州350007 [2]福建师范大学网络安全与密码技术福建省重点实验室,福州350007

出  处:《计算机系统应用》2016年第6期17-24,共8页Computer Systems & Applications

基  金:国家自然科学基金(61070062;61175123);福建高校产学合作科技重大项目(2010H6007)

摘  要:为解决传统词共现方法在微博中检测话题时计算复杂度大、查全率不高、查准率低的情况,提出一种基于粗糙集原理的改进词共现算法(RSCW).通过词共现关系形成词共现矩阵,并由共现矩阵找出极大完全子图作为话题簇中心,最后由粗糙集原理找出每个话题的关键词集合.在NLPIR微博内容语料库和实时获取的微博数据集上的实验结果表明,该方法能够有效地从大规模微博信息中检测突发新闻,提高突发新闻的识别率.Traditional word co-occurrence detection methods in microblog news encounter the problems of high computational complexity, high time consuming, low recall rate and low precision. An improved algorithm of word co-occurrence detection based on rough set is proposed in this paper aiming at solving these problems. It builds a word co-occurrence matrix through word co-occurrence relation, and finds out the maximum complete subgraph as topic cluster center via co-occurrence matrix, finally identifies each topic keyword set using the rough set theory. The experimental results carried out on the microblog content corpus of NLPIR and the real-time collection of microblog data set verify that this method can effectively detect news topic from the massive microblog information and realize the news topic tracking.

关 键 词:微博 词共现图 粗糙集 话题检测 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象