基于知网的个人微博语义相关度的聚类研究  被引量:3

Individual microblog clustering by semantic correlation based on HowNet

在线阅读下载全文

作  者:高永兵[1] 宋添树 李江宇 马占飞[3] GAO Yong-bing;SONG Tian-shu;LI Jiang-yu;MA Zhan-fei(School of Information Engineering,Inner Mongolia University of Science and Technology,Baotou 014010;School of Computer Science and Engineering,Guilin University of Aerospace Technology,Guilin 541004;Department of Computer,Baotou Teachers’ College,Baotou 014030,China)

机构地区:[1]内蒙古科技大学信息工程学院,内蒙古包头014010 [2]桂林航天工业学院计算机科学与工程学院,广西桂林541004 [3]包头师范学院计算机系,内蒙古包头014030

出  处:《计算机工程与科学》2019年第6期1128-1135,共8页Computer Engineering & Science

基  金:国家自然科学基金(61762071);内蒙古自治区自然科学基金(2015MS0621)

摘  要:聚类相关度大的个人微博有助于快速了解博主的专业兴趣和经历,目前的短文本聚类方法缺乏对于语义和句子相关度的充分考虑,提出了一种基于知网的个人微博语义相关度的聚类方法。其要点如下:(1)利用Skip-gram训练大量微博文本生成词汇向量;(2)根据词汇义原进行句内词汇消除歧义;(3)分别计算个人微博之间词汇和句子的相似度并将其综合得到博文相关度;(4)根据博文相关度进行个人微博的聚类。实验表明,相较于层次聚类法、密度聚类法,本文算法的准确度有明显提高。Individual microblogs with large clustering correlation enable a quick understanding of bloggers' professional interests and experiences. Existing short text clustering methods lack sufficient consideration of the correlation between semantics and sentences. We propose a novel individual microblog clustering method according to semantic correlation based on the HowNet. The main steps are as follows:(1) use the skip-gram to train a large number of microblog texts to generate word vectors;(2) according to original semantic senses of words to eliminate ambiguity in the sentence;(3) calculate the similarity of words and sentences between microblogs respectively and get the correlation metrics;(4) cluster individual microblogs according to the microblog correlation. Experimental results show that the proposed clustering method outperforms the hierarchical clustering method and density clustering method.

关 键 词:个人微博 知网 语义 聚类 消歧 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象