检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张佳明[1] 席耀一[1] 王波[1] 唐浩浩 李天彩 ZHANG Jiaming;XI Yaoyi;WANG Bo;TANG Haohao;LI Tiancai(Institute of Information and System Engineering, PLA Information Engineering University, Zhengzhou 450001, China)
机构地区:[1]解放军信息工程大学信息系统工程学院,郑州450001
出 处:《计算机工程与应用》2016年第17期73-78,117,共7页Computer Engineering and Applications
基 金:国家高技术研究发展计划(863)(No.2011AA7032030D);全军军事研究生课题资助项目(No.2011JY002-158);国家社会科学基金项目(No.14BXW028)
摘 要:微博文本长度短,且网络新词层出不穷,使得传统方法在微博事件追踪中效果不够理想。针对该问题,提出一种基于词向量的微博事件追踪方法。词向量不仅可以计算词语之间的语义相似度,而且能够提高微博间语义相似度计算的准确率。该方法首先使用Skip-gram模型在大规模数据集上训练得到词向量;然后通过提取关键词建立初始事件和微博表示模型;最后利用词向量计算微博和初始事件之间的语义相似度,并依据设定阈值进行判决,完成事件追踪。实验结果表明,相比传统方法,该方法能够充分利用词向量引入的语义信息,有效提高微博事件追踪的性能。The traditional methods in micro-blog events tracking do not achieve good performance, because the length of micro-blog text is shorter and the cyber-words emerge constantly. To solve this problem, a method of micro-blog event tracking based on word vector is proposed. By using word vector, semantic similarity between the words can be computed,and the accuracy of semantic similarity between micro-blogs can also be improved. Firstly, the Skip-gram model is trained to get the word vector by using a large dataset. Then, the models for initial event and micro-blogs are constructed by extracting the keywords. Finally, the semantic similarities between micro-blogs and the initial event are computed through word vector, and the task of event tracking is completed according to the decision of pre-defined threshold. The experimental results show that the proposed method can make full use of semantic information contained by word vector, which can effectively improve the tracking performance compared with traditional methods.
关 键 词:微博 事件追踪 短文本 Skip-gram模型 词向量 语义信息
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222