基于信息最大化变分自编码器的孪生神经主题模型  被引量:3

A SIAMESE NEURAL TOPIC MODEL BASED ON INFORMATION MAXIMIZING VARIATIONAL AUTOENCODER

在线阅读下载全文

作  者:刘佳琦 李阳 Liu Jiaqi;Li Yang(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,Anhui,China)

机构地区:[1]中国科学技术大学计算机科学与技术学院,安徽合肥230027

出  处:《计算机应用与软件》2020年第9期118-125,共8页Computer Applications and Software

摘  要:基于变分自编码器的神经主题模型是一种典型的主题模型。由于该模型忽略了文档之间的相似性,可能导致语义相近的文档对应的隐变量之间距离较大。此外,在变分自编码器的训练过程中,还存在忽视隐变量的现象,导致模型不能很好地学习文档的向量表示。针对上述问题,提出孪生神经主题模型及其变种,通过孪生网络对神经主题模型进行扩展,引入了文档之间的相似度信息。网络的子结构采用信息最大化变分自编码器构建主题模型,提高了隐变量与文档的相关性。实验结果表明,该模型在文档检索任务中有较好的表现,并且提取的主题具有良好的解释性。Neural topic models based on variational autoencoder are one of the typical topic models.The models ignore the similarity between documents,so it leads to a large difference between latent variables of documents with similar semantics.In addition,the phenomenon of ignoring latent variables exists in the training process of variational autoencoder,which leads to the models cannot learn the vector representation of documents well.In order to overcome these problems,this paper proposes a Siamese neural topic model and its variant.The neural topic model was extended by Siamese network,and the similarity information between documents was introduced.The sub-structure of the network used the information maximizing variational autoencoder to build the topic model,which improved the correlation between latent variables and documents.The experimental results show that this models perform well in document retrieval tasks,and the extracted topics have good interpretability.

关 键 词:变分自编码器 主题模型 文档表示 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象