检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘佳琦 李阳 Liu Jiaqi;Li Yang(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,Anhui,China)
机构地区:[1]中国科学技术大学计算机科学与技术学院,安徽合肥230027
出 处:《计算机应用与软件》2020年第9期118-125,共8页Computer Applications and Software
摘 要:基于变分自编码器的神经主题模型是一种典型的主题模型。由于该模型忽略了文档之间的相似性,可能导致语义相近的文档对应的隐变量之间距离较大。此外,在变分自编码器的训练过程中,还存在忽视隐变量的现象,导致模型不能很好地学习文档的向量表示。针对上述问题,提出孪生神经主题模型及其变种,通过孪生网络对神经主题模型进行扩展,引入了文档之间的相似度信息。网络的子结构采用信息最大化变分自编码器构建主题模型,提高了隐变量与文档的相关性。实验结果表明,该模型在文档检索任务中有较好的表现,并且提取的主题具有良好的解释性。Neural topic models based on variational autoencoder are one of the typical topic models.The models ignore the similarity between documents,so it leads to a large difference between latent variables of documents with similar semantics.In addition,the phenomenon of ignoring latent variables exists in the training process of variational autoencoder,which leads to the models cannot learn the vector representation of documents well.In order to overcome these problems,this paper proposes a Siamese neural topic model and its variant.The neural topic model was extended by Siamese network,and the similarity information between documents was introduced.The sub-structure of the network used the information maximizing variational autoencoder to build the topic model,which improved the correlation between latent variables and documents.The experimental results show that this models perform well in document retrieval tasks,and the extracted topics have good interpretability.
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33