基于语义分布相似度的主题模型被引量：2

Semantic distribution similarity based topic model

作　　者：居亚亚杨璐[1] 严建峰[1] Ju Yaya;Yang Lu;Yan Jianfeng(School of Computer Science&Technology,Soochow University,Suzhou Jiangsu 215006,China)

机构地区：[1]苏州大学计算机科学与技术学院

出　　处：《计算机应用研究》2019年第12期3553-3557,共5页Application Research of Computers

基　　金：国家自然科学基金资助项目(61572339,61272449);江苏省科技支撑计划重点项目(BE2014005)

摘　　要：潜在狄利克雷分布(LDA)以词袋(bag of words,BOW)模型为基础,简化了建模的复杂度,但使得主题的语义连贯性较差,文档表征能力不强。为解决此问题,提出了一种基于语义分布相似度的主题模型。该模型在EM(expectation maximization)算法框架下,使用GPU(generalized Pólya urn)模型加入单词-单词和文档-主题语义分布相似度来引导主题建模,从语义关联层面上削弱了词袋假设对主题产生的影响。在四个公开数据集上的实验表明,基于语义分布相似度的主题模型在主题语义连贯性、文本分类准确率方面相对于目前流行的主题建模算法表现得更加优越,同时该模型提高了收敛速度和模型精度。LDA is based on the bag-of-words,which simplifies the complexity of modeling,but makes the semantic coherence of topics poor,and text representation ability is not strong. To solve this problem,this paper proposed the semantic distribution similarity based topic model. This model used GPU( generalized Pólya urn) model to add word-word and document-topic semantic distribution similarity to guide topic modeling under the framework of EM( expectation maximization) algorithm,which weakened the effect of bag-of-words hypothesis on topics from the semantic association level. Experiments on four public datasets show that the semantic distribution similarity based topic model is superior to the currently popular topic modeling algorithms in terms of topic semantic coherence and text classification accuracy,and the model improves the convergence speed and topic accuracy.

关键词：潜在狄利克雷分布语义分布相似度主题模型 GPU模型

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语义分布相似度的主题模型被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于语义分布相似度的主题模型 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于语义分布相似度的主题模型被引量：2