基于Labeled-LDA模型的文本分类新算法被引量：103

Text Classification Based on Labeled-LDA Model

出　　处：《计算机学报》2008年第4期620-627,共8页Chinese Journal of Computers

基　　金：国家自然科学基金项目(60773027);国家自然科学基金重点项目(60736044);国家“八六三”高技术研究发展计划重点项目基金(2006AA010108)资助~~

摘　　要：LDA(Latent Dirichlet Allocation)模型是近年来提出的一种能够提取文本隐含主题的非监督学习模型.通过在传统LDA模型中融入文本类别信息,文中提出了一种附加类别标签的LDA模型(Labeled-LDA).基于该模型可以在各类别上协同计算隐含主题的分配量,从而克服了传统LDA模型用于分类时强制分配隐含主题的缺陷.与传统LDA模型的实验对比表明:基于Labeled-LDA模型的文本分类新算法可以有效改进文本分类的性能,在复旦大学中文语料库上micro-F1提高约5.7%,在英文语料库20newsgroup的comp子集上micro-F1提高约3%.LDA（Latent Dirichlet Allocation） is a recently proposed model which extracts latent topics from text data. In this paper, Labeled-LDA is proposed to enhance the traditional LDA to integrate the class information. Based on Labeled-LDA, a new algorithm is introduced to figure out the latent topics＇ quantities of each class synergistical]y. In such a way, Labeled-LDA model avoids compulsive allocation behaviors of the traditional LDA when it is used as a component in classification frame. Experiments on fudan corpus and the comp subset of 20newsgrop corpus show the new method can improve text classification effectiveness： On micro_F1 measure, it approaches an improvement of 5.7% on fudan corpus and 3% on the comp subset of 20newsgrop corpus.

关键词：文本分类图模型隐含狄利克雷分配变分推断

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Labeled-LDA模型的文本分类新算法被引量：103

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Labeled-LDA模型的文本分类新算法 被引量：103

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Labeled-LDA模型的文本分类新算法被引量：103