基于LDA-WO混合模型的微博话题有序特征抽取研究被引量：3

An Ordered Feature Extraction Method of Microblog Based on a Scalable Topic LDA-WO Mixed Models

出　　处：《情报科学》2017年第7期44-49,55,共7页Information Science

基　　金：国家自然科学基金面上项目(71373123);江苏高校哲学社会科学研究重点项目(2015ZDIXM007);江苏省普通高校研究生科研创新计划项目(KYZZ15_0104);中央高校基本科研业务费专项资金资助

摘　　要：【目的/意义】考虑到使用LDA模型进行主题抽取时,抽取到的特征词是无序的,破坏了原有的主谓宾结构,导致抽取效果不准确,可读性差的缺陷,构造了WO词序模型,并将LDA模型与WO模型结合,提出了基于LDA-WO混合模型的微博主题有序特征抽取算法。【方法/过程】使用LDA模型进行主题建模,获得无序特征词,然后通过WO模型对特征词进行排序,将特征词与原语料进行对比,构造特征词-语料位置矩阵,通过对特征词的位置排序,构造特征词词序权值矩阵,最终获得有序的特征词,完成对话题特征的有序抽取。【结果/结论】本文以真实新浪微博数据为实验对象,实验结果表明基于LDA-WO模型的特征词提取方法进行特征抽取,抽取到的特征词可读性更强,可弥补传统LDA模型在话题可解释性上的不足。[ Purpose/significance ] Concerning that the key words are unordered when the LDA model is used to extract fea- ture of microblog, and the structure of subject-verb-object is destroyed, so that the result is eminently readable, this paper builds a word-ordered model （ WO for short）, and presents an ordered feature extraction method of microblog based on a scalable topic LDA-WO mixed models by combining LDA with WO. [Method/process] Get the unordered feature words through building LDA topic models, and put them in sequence by contrasting with the datasets, then construct feature words-datasets position matrix and feature words-datasets order matrix by sorting data in position order, get weights of the feature words join orders to output the ordered feature words. [Result/conclusion] Experiments on Sina Weibo real datasets show that the method can compensate for the lack of traditional LDA model on the topic and extract feature of microblog top- ic in interpretability effectively.

关键词：WO-LDA模型微博话题有序特征抽取词序

分类号：G206.3[文化科学—传播学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于LDA-WO混合模型的微博话题有序特征抽取研究被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于LDA-WO混合模型的微博话题有序特征抽取研究 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于LDA-WO混合模型的微博话题有序特征抽取研究被引量：3