基于SOM聚类的微博话题发现  被引量:10

Microblog topics detection based on SOM clustering

在线阅读下载全文

作  者:宋莉娜 冯旭鹏 刘利军[1] 黄青松[1,3] Song Lina;Feng Xupeng;Liu Lijun;Huang Qingsong(Faculty of Information Engineering&Automation,Kunming University of Science&Technology,Kunming 650500,China;Educational Technology&Network Center,Kunming University of Science&Technology,Kunming 650500,China;Yunnan Provincial Key Laboratory of Computer Technology Applications,Kunming University of Science&Technology,Kunming 650500,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]昆明理工大学教育技术与网络中心,昆明650500 [3]昆明理工大学云南省计算机技术应用重点实验室,昆明650500

出  处:《计算机应用研究》2018年第3期671-674,679,共5页Application Research of Computers

基  金:国家自然科学基金资助项目(81360230;81560296)

摘  要:随着微博用户的增多,微博平台的信息更新频繁。针对微博文本的数据稀疏性、新词多、用语不规范等特点,提出了基于SOM聚类的微博话题发现方法。从原始语料中对文本进行预处理,通过词向量模型对短文本进行特征提取,降低了向量维度过高带来的计算量繁重问题。采用改进的SOM对话题进行聚类,该算法改善了传统文本聚类的不足,进而能有效地发现话题。实验表明该算法较传统文本聚类算法的综合指标F值有明显提高。With the increase of microblog users,the information of microblog platform is updating frequently.This paper proposed microblog topics detection based on SOM clustering for the features of the microblog text data sparseness,new words and non-standard words.Firstly,it pretreated the short texts from the primitive text corpus,and extracted the features of the short texts by the word vector model which reduced the computational burden caused by the high vector dimension.In order to reduce the large amount of computation just to the high vector dimensions,this paper extracted the short text feature extraction by word vector model.Then,the topic clustering could be achieved by an improved SOM clustering.The algorithm improved the traditional texts clustering shortcoming.And the algorithm could find the topic effectively.Experimental results show that the algorithm’s comprehensive index F value is improved obviously than the traditional methods.

关 键 词:话题发现 词向量模型 文本相似度 短文本 SOM聚类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象