一种新的网络热点话题提取方法  被引量:6

Novel Approach to Internet Hot Topic Extraction

在线阅读下载全文

作  者:蒙祖强[1] 黄柏雄[1] 

机构地区:[1]广西大学计算机与电子信息学院,南宁530004

出  处:《小型微型计算机系统》2013年第4期743-748,共6页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61063032)资助;广西自然科学基金项目(2012GXNSFAA053225)资助;广西教育厅科研基金项目(201012MS010)资助

摘  要:网络热点话题提取是网络舆情分析的重要手段,已成为信息检索领域研究的热点内容之一.传统聚类方法因其聚类结果不允许相交等因素,暴露了其在基于(主题)词聚类进行话题发现中的诸多缺点.本文基于小世界理论建立词的共现网络模型并去除大量冗余词,然后运用极大相容块技术并基于过滤后的词共现网络实现对相交话题的提取,获取网络热点话题.本文方法与传统聚类方法有本质区别,基于(主题)词聚类进行话题发现具有独特的优势,较好克服了已有方法的缺点.实验说明了本文方法对提取网络热点话题是有效和可行的,比同类算法具有更好的性能,且具有较好的可伸缩性.Intemet hot topic extraction is an important means for internet public opinion analysis, and it has been one of hot research topics in information retrieval. Using traditional clustering methods, the obtained classes are mutually exclusive, which leads to many disadvantages when they are used in topic discovery. This paper establishes subject word co-occurrence network based on small world theory and removes a lot of redundant words, and then applies maximal consistent block technique to extract overlapping topics with the filtered occurrence network, each class corresponding to a hot topic. The proposed method is essentially different from traditional clustering methods; it has particular advantage on topic discovery which is based on subject word clustering, overcoming some shortcomings of existing methods. Numerical experimentation shows that the proposed method is effective and feasible for internet hot topic extraction, and its performance is superior to that of the existing methods.

关 键 词:热点话题 极大相容块 词共现网络 词聚类 文本聚类 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象