检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高永兵[1] 宋添树 李江宇 马占飞[3] GAO Yong-bing;SONG Tian-shu;LI Jiang-yu;MA Zhan-fei(School of Information Engineering,Inner Mongolia University of Science and Technology,Baotou 014010;School of Computer Science and Engineering,Guilin University of Aerospace Technology,Guilin 541004;Department of Computer,Baotou Teachers’ College,Baotou 014030,China)
机构地区:[1]内蒙古科技大学信息工程学院,内蒙古包头014010 [2]桂林航天工业学院计算机科学与工程学院,广西桂林541004 [3]包头师范学院计算机系,内蒙古包头014030
出 处:《计算机工程与科学》2019年第6期1128-1135,共8页Computer Engineering & Science
基 金:国家自然科学基金(61762071);内蒙古自治区自然科学基金(2015MS0621)
摘 要:聚类相关度大的个人微博有助于快速了解博主的专业兴趣和经历,目前的短文本聚类方法缺乏对于语义和句子相关度的充分考虑,提出了一种基于知网的个人微博语义相关度的聚类方法。其要点如下:(1)利用Skip-gram训练大量微博文本生成词汇向量;(2)根据词汇义原进行句内词汇消除歧义;(3)分别计算个人微博之间词汇和句子的相似度并将其综合得到博文相关度;(4)根据博文相关度进行个人微博的聚类。实验表明,相较于层次聚类法、密度聚类法,本文算法的准确度有明显提高。Individual microblogs with large clustering correlation enable a quick understanding of bloggers' professional interests and experiences. Existing short text clustering methods lack sufficient consideration of the correlation between semantics and sentences. We propose a novel individual microblog clustering method according to semantic correlation based on the HowNet. The main steps are as follows:(1) use the skip-gram to train a large number of microblog texts to generate word vectors;(2) according to original semantic senses of words to eliminate ambiguity in the sentence;(3) calculate the similarity of words and sentences between microblogs respectively and get the correlation metrics;(4) cluster individual microblogs according to the microblog correlation. Experimental results show that the proposed clustering method outperforms the hierarchical clustering method and density clustering method.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.166