基于变分主题模型的半监督文本分类  

Semi-Supervised Text Classification Based on Variational Topic Model

在线阅读下载全文

作  者:赵书安[1,2] 周木春 ZHAO Shu’an;ZHOU Muchun(School of Electronic and Optical Engineering,Nanjing University of Science and Technology,Nanjing Jiangsu 210094,China;School of Information Engineering,Jiangsu Open University,Nanjing Jiangsu 210036,China)

机构地区:[1]南京理工大学电子工程与光电技术学院,江苏南京210094 [2]江苏开放大学信息工程学院,江苏南京210036

出  处:《电子器件》2023年第2期463-468,共6页Chinese Journal of Electron Devices

摘  要:为解决实际应用场景中常面临的数据标注不足的问题,提出一种基于变分主题模型的半监督文本分类模型。首先使用无监督变分主题模型挖掘出语义信息集中的文档-主题分布,作为有效的文档特征表示,再通过半监督方式训练分类器。基于神经网络的变分主题模型相较传统的主题模型,不仅可以得到合理的主题,而且推断速度更快。在20NewsGroup等数据集上的实验结果表明,所提出的模型仅使用30%的训练数据就可以取得与使用90%训练数据的半监督基线模型相当甚至更好的结果,证明了所提出模型的正确性和实用性。In order to solve the problem of insufficient labelling data often faced in practical application scenarios,a semi-supervised text classification model based on variational topic model is proposed.An unsupervised variational topic model is first adopted to mine the document-topic distribution which is semantic-intensive,and then the topics are utilized as effective document feature representations,fi-nally the classifier is trained in a semi-supervised manner.The neural network-based variational topic model can not only get a reasona-ble topic,but also obtain a faster inference than the traditional topic model.Experimental results on datasets including 20NewsGroup demonstrate that the proposed model using only 30%of the training data can achieve results equivalent to or better than the traditional semi-supervised baseline model using 90%of the training data,which proves the accuracy and practicality of the proposed model.

关 键 词:变分主题模型 半监督学习 文本分类 

分 类 号:TN911.7[电子电信—通信与信息系统] TP181[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象