有监督主题模型的SLDA-TC文本分类新方法  被引量:11

SLDA-TC:A Novel Text Categorization Approach Based on Supervised Topic Model

在线阅读下载全文

作  者:唐焕玲[1,2,3] 窦全胜 于立萍[1,2,3] 宋英杰 鲁明羽[4] TANG Huan-ling;DOU Quan-sheng;YU Li-ping;SONG Ying-jie;LU Ming-yu(School of Computer Science and Technology,Shandong Technology and Business University,Yantai,Shandong 264005,China;Co-innovation Center of Shandong Colleges and Universities:Future IntelligentComputing,Yantai,Shandong 264005,China;Key Laboratory of Intelligent Information Processing in Universities of Shandong(Shandong Technology and Business University),Shandong Yantai,264005,China;Information Science and Technology College,Dalian Maritime University,Dalian Liaoning 116026,China)

机构地区:[1]山东工商学院计算机科学与技术学院,山东烟台264005 [2]山东省高等学校协同创新中心:未来智能计算,山东烟台264005 [3]山东省高校智能信息处理重点实验室山东工商学院,山东烟台264005 [4]大连海事大学信息科学技术学院,辽宁大连116026

出  处:《电子学报》2019年第6期1300-1308,共9页Acta Electronica Sinica

基  金:国家自然科学基金(No.61175053,No.61472227,No.61773244,No.61602277,No.61772319);山东省高校科研计划(No.J18KA385)

摘  要:本文提出了一种有监督主题模型的SLDA-TC(Super vised LDA-Text Categorization)文本分类方法,引入主题-类别概率分布参数,识别主题-类别的语义信息;提出SLDA-TC-Gibbs主题采样新方法,对每个词的隐含主题采样,只从该词所在文档的同类其它文档中采样,并给出了理论推导;另外,其主题数只需略大于类别数.实验表明,对比LDA-TC(LDA-Text Categorization)和SVM算法,本方法能提高分类精度和时间性能.In this paper,SLDA-TC,a novel text categorization model based on supervised topic model is proposed.The new parameter represents the probability distribution of topic-category is introduced.The SLDA-TC-Gibbs sampling algorithm is presented.At each iteration,a word’s latent topic sampling only utilizes the other training documents having the same category with the document the word occurred,meanwhile,the theoretical proof is given.In the SLDA-TC model,the number of topics is only slightly larger than the number of categories.The experimental results demonstrate that the SLDA-TC model promotes the accuracy and speed for text classification compared with the LDA-TC and SVM algorithms.

关 键 词:文本分类 主题模型 隐含Dirichlet分布 吉布斯采样 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象