检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冯旭鹏[1] 马震[1] 谢波[1] 刘利军[2] 黄青松[2] FENG Xupeng;MA Zhen;XIE Bo;LIU Lijun;HUANG Qingsong(Educational Technology and Campus Network Center, Kunming University of Science and Technology, Kunming 650500, China;Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China)
机构地区:[1]昆明理工大学教育技术与网络中心,昆明650500 [2]昆明理工大学信息工程与自动化学院,昆明650500
出 处:《计算机工程与应用》2017年第8期81-86,共6页Computer Engineering and Applications
基 金:国家自然科学基金(No.81360230;No.81560296)
摘 要:微博中短文本、用语不规范和大量噪音等特性使得传统话题发现方法不能很好地从中获取新话题。针对微博以上特性和话题动态性提出一种基于聚类集成的微博话题发现方法,该方法考虑微博发布的非线性时间因子,采用改进的K-Means方法分别融合微博的各个特性构造其对应的基聚类器,并评估各基聚类器之间的有效性和差异性,以此设置集成投票权值并最终进行聚类集成。实验对比结果表明,该方法将微博发现话题的准确性提升约9.5%,能够更有效地探测到新话题。The short text,randomness and a large amount of noise make the traditional methods of topic detection can not be solved to get the new topic,and these topic detection techniques have not considered the time factor of the microblog post.In this paper,the microblog topic detection method based on clustering ensemble is proposed for the characteristics of micro-blog and topic dynamic performance.This method considers the nonlinear time factor of microblog post,the improved K-Means method is used to construct the corresponding base cluster based on each feature of microblog,evaluate the effectiveness and difference between the each cluster,so as to set up the ensemble voting weights and the clustering ensemble is used for microblog topic detection.Experimental results show that the proposed method gets an accuracy up to9.5%in microblog topic detection,which can detect the new topic more effectively.
关 键 词:短文本 噪音 话题发现 动态性 非线性时间 基聚类器 聚类集成
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222