检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈阳键[1] 温秋华 CHEN Yangjian;WEN Qiuhua(Digital Service Center,Guangzhou Radio and Television University,Guangzhou Guangdong 510000,China;School of Information Science and Technology,Jinan University,Guangzhou Guangdong 510000,China)
机构地区:[1]广州开放大学(广州市广播电视大学)数字化服务中心,广东广州510000 [2]暨南大学信息科学技术学院,广东广州510000
出 处:《太赫兹科学与电子信息学报》2023年第3期378-383,391,共7页Journal of Terahertz Science and Electronic Information Technology
基 金:广东省广州市高校第九批教育教学改革基金资助项目(2017F10)。
摘 要:微博文本数据高维度、同义、多义特征明显,传统基于向量空间模型(VSM)联合K-均值的热点话题发现方法存在准确率低,计算复杂,聚类中心难以确定等问题。提出一种相关向量机(RVM)优化VSM的微博文本向量化方法,首先利用RVM的自适应特征选择能力对VSM特征向量进行降维,然后利用主成分分析(PCA)方法确定K-均值算法的初始聚类中心,进而采用K-均值算法得到聚类结果,最后根据微博转发、评论和高影响力用户数量定义热度指数,热度指数最大的话题即为当前热点话题。采用实际微博文本数据集开展实验,结果表明所提方法相对于2种传统方法的准确率分别提升7.3%和1.1%,实时性分别提升45%和53%。Micro-blog text data is high-dimensional,bearing the obvious features of synonymy and polysemy.Traditional topic detection method based on Vector Space Model(VSM)combined with Kmeans has some problems such as low accuracy,complex calculation,and being difficult to determine the center of clustering.A Relevance Vector Machine(RVM)optimized VSM method is proposed to realize the text vectorization.Firstly,the dimension of VSM feature vector is reduced automatically by using the adaptive feature selection ability of RVM,and then Principal Component Analysis(PCA)is applied to determine the cluster center of K-means clustering algorithm.K-means algorithm is employed to get the clustering results.Finally,according to the number of micro-blog forwarding and comments,the topic with the largest heat index is the current hot topic.The results show that compared with two traditional methods,the accuracy of the proposed method is improved by 7.3%and 1.1%,and the real-time performance is improved by 45%and 53%,respectively.
关 键 词:热点话题发现 向量空间模型 话题聚类 数据降维 微博
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7