检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:车蕾[1] CHE Lei(School of Information Management, Beijing Information Science & Technology University, Beijing 100192, China)
机构地区:[1]北京信息科技大学信息管理学院,北京100192
出 处:《国防科技大学学报》2022年第1期169-178,共10页Journal of National University of Defense Technology
基 金:北京市教育委员会社科计划一般项目(SM201911232003);北京信息科技大学教学改革项目重点资助项目(2020JGZD03);教育部人文社科规划基金资助项目(20YJAZH129)。
摘 要:针对文本特征提取方面的高维数据特征区分度较低、基于规则的特征学习的自学习性能差、变分自动编码器存在过度剪枝等问题,提出稀疏平衡变分自动编码器(Sparse Balanced Variational AutoEncoder,SBVAE)的文本特征提取模型。为消除噪声干扰,提高文本特征提取模型的鲁棒性,在文本特征提取的输入层采用双向降噪处理机制。提出一种稀疏平衡性处理,结合KL(Kullback-Leibler)项权重的模拟退火算法以缓解KL散度引发的过度剪枝的影响,强制解码器更充分地利用潜变量。此模型提高了高维数据特征的区分度。从对比分析文本特征提取模型、稀疏性能、稀疏平衡处理对隐藏空间变分下界的影响等方面深入开展实验,验证了该模型具有较好的性能。该模型在复旦数据集和Reuters数据集上的最高准确率相较于主成分分析分别提升了12.36%、8.06%。In order to solve the problems of low feature differentiation of high-dimensional data in text feature extraction,poor self-learning performance of rule-based representation learning,and excessive pruning of variational autoencoder,a text feature extraction model based on SBVAE(sparse balanced variational autoencoder)was proposed.In order to eliminate noise interference and improve robustness of the text feature extraction model,a bidirectional noise reduction mechanism was designed for variational autoencoder in the input layer of the text feature extraction.A sparse balance method combined with simulated annealing algorithm of weights of KL(Kullback-Leibler)terms was proposed to alleviate the effect of excessive pruning caused by KL divergence,and forced decoders to make full use of the latent variables.The model improves the discrimination of high-dimensional data features.Experiments were carried out in several aspects,including comparative analysis of text feature extraction model,sparse performance and influence of sparse balance on the lower bound of variation in hidden space.The results show that the proposed model has good performance.The highest accuracy of the proposed model of Fudan and Reuters datasets is increased by 12.36%and 8.06%in comparison with that of PCA,respectively.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.118.122.239