检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广西大学计算机与电子信息学院,南宁530004
出 处:《小型微型计算机系统》2013年第4期743-748,共6页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61063032)资助;广西自然科学基金项目(2012GXNSFAA053225)资助;广西教育厅科研基金项目(201012MS010)资助
摘 要:网络热点话题提取是网络舆情分析的重要手段,已成为信息检索领域研究的热点内容之一.传统聚类方法因其聚类结果不允许相交等因素,暴露了其在基于(主题)词聚类进行话题发现中的诸多缺点.本文基于小世界理论建立词的共现网络模型并去除大量冗余词,然后运用极大相容块技术并基于过滤后的词共现网络实现对相交话题的提取,获取网络热点话题.本文方法与传统聚类方法有本质区别,基于(主题)词聚类进行话题发现具有独特的优势,较好克服了已有方法的缺点.实验说明了本文方法对提取网络热点话题是有效和可行的,比同类算法具有更好的性能,且具有较好的可伸缩性.Intemet hot topic extraction is an important means for internet public opinion analysis, and it has been one of hot research topics in information retrieval. Using traditional clustering methods, the obtained classes are mutually exclusive, which leads to many disadvantages when they are used in topic discovery. This paper establishes subject word co-occurrence network based on small world theory and removes a lot of redundant words, and then applies maximal consistent block technique to extract overlapping topics with the filtered occurrence network, each class corresponding to a hot topic. The proposed method is essentially different from traditional clustering methods; it has particular advantage on topic discovery which is based on subject word clustering, overcoming some shortcomings of existing methods. Numerical experimentation shows that the proposed method is effective and feasible for internet hot topic extraction, and its performance is superior to that of the existing methods.
关 键 词:热点话题 极大相容块 词共现网络 词聚类 文本聚类
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46