稀疏平衡变分自动编码器的文本特征提取被引量：1

Text feature extraction based on sparse balanced variational autoencoder

作　　者：车蕾[1] CHE Lei(School of Information Management, Beijing Information Science & Technology University, Beijing 100192, China)

机构地区：[1]北京信息科技大学信息管理学院,北京100192

出　　处：《国防科技大学学报》2022年第1期169-178,共10页Journal of National University of Defense Technology

基　　金：北京市教育委员会社科计划一般项目(SM201911232003);北京信息科技大学教学改革项目重点资助项目(2020JGZD03);教育部人文社科规划基金资助项目(20YJAZH129)。

摘　　要：针对文本特征提取方面的高维数据特征区分度较低、基于规则的特征学习的自学习性能差、变分自动编码器存在过度剪枝等问题,提出稀疏平衡变分自动编码器(Sparse Balanced Variational AutoEncoder,SBVAE)的文本特征提取模型。为消除噪声干扰,提高文本特征提取模型的鲁棒性,在文本特征提取的输入层采用双向降噪处理机制。提出一种稀疏平衡性处理,结合KL(Kullback-Leibler)项权重的模拟退火算法以缓解KL散度引发的过度剪枝的影响,强制解码器更充分地利用潜变量。此模型提高了高维数据特征的区分度。从对比分析文本特征提取模型、稀疏性能、稀疏平衡处理对隐藏空间变分下界的影响等方面深入开展实验,验证了该模型具有较好的性能。该模型在复旦数据集和Reuters数据集上的最高准确率相较于主成分分析分别提升了12.36%、8.06%。In order to solve the problems of low feature differentiation of high-dimensional data in text feature extraction,poor self-learning performance of rule-based representation learning,and excessive pruning of variational autoencoder,a text feature extraction model based on SBVAE(sparse balanced variational autoencoder)was proposed.In order to eliminate noise interference and improve robustness of the text feature extraction model,a bidirectional noise reduction mechanism was designed for variational autoencoder in the input layer of the text feature extraction.A sparse balance method combined with simulated annealing algorithm of weights of KL(Kullback-Leibler)terms was proposed to alleviate the effect of excessive pruning caused by KL divergence,and forced decoders to make full use of the latent variables.The model improves the discrimination of high-dimensional data features.Experiments were carried out in several aspects,including comparative analysis of text feature extraction model,sparse performance and influence of sparse balance on the lower bound of variation in hidden space.The results show that the proposed model has good performance.The highest accuracy of the proposed model of Fudan and Reuters datasets is increased by 12.36%and 8.06%in comparison with that of PCA,respectively.

关键词：变分自动编码器降噪稀疏平衡过度剪枝

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

稀疏平衡变分自动编码器的文本特征提取被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

稀疏平衡变分自动编码器的文本特征提取 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

稀疏平衡变分自动编码器的文本特征提取被引量：1