检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李海明 LI Hai-ming(College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China)
机构地区:[1]山东科技大学计算机科学与工程学院
出 处:《软件导刊》2019年第9期173-175,182,共4页Software Guide
摘 要:为及时从海量微博信息中迅捷有效提取出微博热点话题、事件,提出基于频繁集的聚类SSDKmeans算法,在有限空间下统计分词的近似频数,并在此基础上构建文本向量空间模型,在聚类生成的每个话题簇中提炼话题关键词。通过对2万条微博数据进行有效性验证,结果表明,基于SSDKmeans算法的话题发现有较高的召回率和精准率,分别为91.3%、92.1%。SSDKmeans算法能够有效提高微博热点话题发现率,进而及时了解社会热点话题与舆论趋势。In order to quickly and effectively generate hot topics and events from the massive micro-blog information, in this paper, a clustering algorithm based on SSDKmeans of frequent sets is proposed to calculate the approximate frequency of word segmentation in finite space, and on this basis, a text vector space model is constructed to extract topic keywords in each topic cluster generated by clustering. The validity of 20 000 real microblog data is verified. The experimental results show that topic discovery based on SSDKmeans algorithm has higher recall rate and precision rate, 91.3% and 92.1% respectively. SSDKmeans algorithm can effectively improve the discovery of hot topics in Microblog, and then more timely understand the social hot topics, public opinion trends.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222