检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张佩瑶 刘东苏[1] ZHANG Pei-yao;LIU Dong-su(School of Economics and Management,Xidian University,Xi'an 710126,China)
机构地区:[1]西安电子科技大学经济与管理学院
出 处:《情报科学》2019年第7期61-64,71,共5页Information Science
基 金:国家自然科学青年基金项目“大规模动态社交网络社团检测算法研究”(71401130)
摘 要:【目的/意义】移动互联网时代,微博以其快速、便捷的优点迅速成为信息传播与共享的平台之一。在互联网信息传播过程中,话题内容焦点会随着时间推动发生动态迁移,及时准确的发现话题内容焦点的迁移有助于了解网络舆情的演化趋势。【方法/过程】首先,定义基于焦点特征词分布的焦点词提取公式,构造焦点特征词集合;然后,使用Skip-gram模型在大规模语料上训练得到词向量,再通过BTM对文本建模,直接在BTM主题维上结合焦点特征词集合构造主题词向量;最后,计算主题特征词间的相似度,将其应用到聚类算法中实现话题焦点识别。【结果/结论】通过对新浪微博数据集上的实验结果表明,本方法能够充分利用词向量引入的语义信息,提高文本聚类效果,有效的获取各阶段的话题焦点。【Purpose/significance】In the area of mobile,microblog has been playing a significant role in the distribution and transmission of many hotspot topics effectively.In the process of Internet information transmission,the news of certain is constantly updated,but the reports focus on different contents.Timely and accurate discovery of the shift of topic focus is helpful to understand the evolution trend of online public opinion.【Method/process】Firstly,the formula for feature word extraction based on focus feature distribution is defined,the focus feature words set is constructed.Secondly,the word embeddings that represents semantics of the feature word is gained through training in large-scale corpus with the Skip-gram model.BTM is used in texts modeling,in BTM thematic dimension,the theme word vector is constructed by combining the focus feature.Finally,the similarity between thematic feature words is calculated,and is applied to the clustering algorithm to realize topic focus recognition【Result/conclusion】The result of experimental analysis on Sina Weibo data shows that the proposed method can make full use of semantic information contained by word vector,which can effectively get the topic focus of each stage.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15