基于深度对比学习的文本聚类  被引量:1

Text Clustering Based on Deep Contrast Learning

在线阅读下载全文

作  者:胥桂仙[1] 李晓荣 XU Guixian;LI Xiaorong(School of Information Engineering,Minzu University of China,Beijing 100081,China)

机构地区:[1]中央民族大学信息工程学院,北京100081

出  处:《中央民族大学学报(自然科学版)》2024年第3期62-72,共11页Journal of Minzu University of China(Natural Sciences Edition)

基  金:北京市社科基金项目(20YYB011)。

摘  要:无监督聚类的目的是根据表示空间中的距离将数据划分为有意义或有用的簇,但往往不同类别在表示空间中是相互重叠的,为了实现不同类别的良好分离,使用实例对比学习模型,修改模型的激活函数为Tanh,并将单层感知机修改为多层感知机,提出了深度对比学习聚类模型。模型首先将原始中文长文本数据集输入神经网络特征提取层BERT中,然后将提取到的全部特征输入实例对比学习层中,对特征进行优化,最终使用K⁃means进行聚类。深度对比学习聚类模型在中文长文本聚类方面的性能相比于无监督聚类,在THUCNews数据集上的准确度提高了10%~25%。能够更好地促进不同类别相互重叠的数据的有效分离,实验效果显著优于现有的其他相关模型。The purpose of unsupervised clustering is to divide the data into meaningful or useful clusters according to the distance in the representation space,The different categories are overlap⁃ping each other in the representation space,In order to achieve a good separation of different catego⁃ries,it can use an example contrast learning model(SCCL),on the basis of the SCCL model,the activation function of the model is modified to Tanh,The Single⁃Layer Perceptron(SLP)was modi⁃fied to a multilayer perceptron,and a Clustering with Deep Contrastive Learning Model(CDCL)was proposed.The model first inputs the original Chinese long text dataset into the neural network fea⁃ture extraction layer Bert,and then inputs all the extracted features into the Instance⁃wise Contras⁃tive Learning(Instance⁃CL)layer to optimize the features,and finally use K⁃means for clustering.The performance of the deep contrast learning clustering model CDCL in Chinese long text clustering is evaluated,and it is shown that the deep contrast learning clustering model CDCL improves the ac⁃curacy of unsupervised clustering by 10%-25%compared with unsupervised clustering on the THUCNews dataset.The results show that the model can better promote the effective separation of different categories of overlapping data,and the experimental effect is significantly better than other existing related models.

关 键 词:实例对比学习模型 深度对比学习聚类模型 长文本聚类 K⁃means 实例对比学习层 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象