一种融合标签语义的微博热点话题挖掘方法被引量：3

A Microblog Hot Topic Mining Method Integrating Tag Semantics

作　　者：周福星陈秀真[1,2] 马进[1,2] 李生红[1,2] ZHOU Fuxing;CHEN Xiuzhen;MA Jin;LI Shenghong(School of Cyber Science and Engineering,Shanghai Jiao Tong University,Shanghai 200240,China;Shanghai Key Laboratory of Integrated Administration Technologies for Information Security,Shanghai 200240,China)

机构地区：[1]上海交通大学网络空间安全学院,上海200240 [2]上海市信息安全综合管理技术研究重点实验室,上海200240

出　　处：《计算机工程》2019年第10期283-287,共5页Computer Engineering

基　　金：国家自然科学基金(61562004,61431008);国家重点研发计划“网络空间安全”(2016YFB0801003)

摘　　要：由于微博文本的长度较短,直接使用隐狄利克雷分布(LDA)模型会导致特征向量高维稀疏。为此,提出一种融合标签语义的热点话题挖掘方法。利用公共块算法计算微博标签的相似度,合并标签相似度较高的微博文本。采用LDA模型对合并后的文本建模,并通过K-means聚类算法挖掘微博热点话题。实验结果表明,与针对单一微博文本建模的方法以及直接合并相同标签的方法相比,该方法的困惑度较低,挖掘热点话题的准确性较高。Due to the short length of the microblog text,the direct use of Latent Dirichlet Allocation(LDA)model will lead to high-dimensional sparse feature vectors.Thus,a hot topic mining method integrating tag semantics is proposed.The common block algorithm is used to calculate the similarity of the microblog tags,and the microblog texts with high tag similarity are combined.The merged text is modeled by LDA model,and the hot topic of microblog is mined by K-means clustering algorithm.Experimental results show that compared with the method of modeling a single microblog text and the method of directly merging the same label,the proposed method obtains a lower perplexity and a higher accuracy in mining topics.

关键词：微博文本隐狄利克雷分布模型标签语义公共块 K-MEANS聚类

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种融合标签语义的微博热点话题挖掘方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种融合标签语义的微博热点话题挖掘方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种融合标签语义的微博热点话题挖掘方法被引量：3