基于词向量的微博事件追踪方法  被引量:12

Method of micro-blog event tracking based on word vector

在线阅读下载全文

作  者:张佳明[1] 席耀一[1] 王波[1] 唐浩浩 李天彩 ZHANG Jiaming;XI Yaoyi;WANG Bo;TANG Haohao;LI Tiancai(Institute of Information and System Engineering, PLA Information Engineering University, Zhengzhou 450001, China)

机构地区:[1]解放军信息工程大学信息系统工程学院,郑州450001

出  处:《计算机工程与应用》2016年第17期73-78,117,共7页Computer Engineering and Applications

基  金:国家高技术研究发展计划(863)(No.2011AA7032030D);全军军事研究生课题资助项目(No.2011JY002-158);国家社会科学基金项目(No.14BXW028)

摘  要:微博文本长度短,且网络新词层出不穷,使得传统方法在微博事件追踪中效果不够理想。针对该问题,提出一种基于词向量的微博事件追踪方法。词向量不仅可以计算词语之间的语义相似度,而且能够提高微博间语义相似度计算的准确率。该方法首先使用Skip-gram模型在大规模数据集上训练得到词向量;然后通过提取关键词建立初始事件和微博表示模型;最后利用词向量计算微博和初始事件之间的语义相似度,并依据设定阈值进行判决,完成事件追踪。实验结果表明,相比传统方法,该方法能够充分利用词向量引入的语义信息,有效提高微博事件追踪的性能。The traditional methods in micro-blog events tracking do not achieve good performance, because the length of micro-blog text is shorter and the cyber-words emerge constantly. To solve this problem, a method of micro-blog event tracking based on word vector is proposed. By using word vector, semantic similarity between the words can be computed,and the accuracy of semantic similarity between micro-blogs can also be improved. Firstly, the Skip-gram model is trained to get the word vector by using a large dataset. Then, the models for initial event and micro-blogs are constructed by extracting the keywords. Finally, the semantic similarities between micro-blogs and the initial event are computed through word vector, and the task of event tracking is completed according to the decision of pre-defined threshold. The experimental results show that the proposed method can make full use of semantic information contained by word vector, which can effectively improve the tracking performance compared with traditional methods.

关 键 词:微博 事件追踪 短文本 Skip-gram模型 词向量 语义信息 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象