基于多层语义融合的结构化深度文本聚类模型  被引量:3

Structured deep text clustering model based on multi-layer semantic fusion

在线阅读下载全文

作  者:马胜位 黄瑞章[1,2] 任丽娜 林川[1,2] MA Shengwei;HUANG Ruizhang;REN Lina;LIN Chuan(State Key Laboratory of Public Big Data(Guizhou University),Guiyang Guizhou 550025,China;College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China)

机构地区:[1]公共大数据国家重点实验室(贵州大学),贵阳550025 [2]贵州大学计算机科学与技术学院,贵阳550025

出  处:《计算机应用》2023年第8期2364-2369,共6页journal of Computer Applications

基  金:国家自然科学基金资助项目(62066007)。

摘  要:近年来,由于图神经网络(GNN)的结构信息在机器学习中的优势,人们开始将GNN结合进深度文本聚类中。当前结合GNN的深度文本聚类算法在文本语义信息融合时忽略了解码器在语义补足上的重要作用,这导致在数据生成部分出现语义信息的缺失。针对以上问题,提出了一种基于多层语义融合的结构化深度文本聚类模型(SDCMS)。该模型利用GNN将结构信息集成到解码器中,通过逐层语义补充增强了文本数据的表示,并通过三重自监督机制获得更好的网络参数。在Citeseer、Acm、Reutuers、Dblp、Abstract这5个真实数据集上进行实验的结果表明,与目前最优的注意力驱动的图形聚类网络(AGCN)模型相比,SDCMS在准确率、归一化互信息(NMI)和平均兰德指数(ARI)上分别最多提升了5.853%、9.922%和8.142%。In recent years,due to the advantages of the structural information of Graph Neural Network(GNN) in machine learning,people have begun to combine GNN into deep text clustering.The current deep text clustering algorithm combined with GNN ignores the important role of the decoder on semantic complementation in the fusion of text semantic information,resulting in the lack of semantic information in the data generation part.In response to the above problem,a Structured Deep text Clustering Model based on multi-layer Semantic fusion(SDCMS) was proposed.In this model,a GNN was utilized to integrate structural information into the decoder,the representation of text data was enhanced through layerby-layer semantic complement,and better network parameters were obtained through triple self-supervision mechanism.Results of experiments carried out on 5 real datasets Citeseer,Acm,Reutuers,Dblp and Abstract show that compared with the current optimal Attention-driven Graph Clustering Network(AGCN) model,SDCMS in accuracy,Normalized Mutual Information(NMI) and Average Rand Index(ARI) has increased by at most 5.853%,9.922% and 8.142%.

关 键 词:深度文本聚类 逐层语义增强 文本语义信息 图神经网络 自监督学习 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象