双特征的短文本聚类研究  

Short Text Clustering Study with Dual Features

在线阅读下载全文

作  者:张桥男 刘渊 ZHANG Qiaonan;LIU Yuan(School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi 214122,China;Jiangsu Key Laboratory of Media Design and Software Technology(Jiangnan University),Wuxi 214122,China)

机构地区:[1]江南大学人工智能与计算机学院,江苏无锡214122 [2]江苏省媒体设计与软件技术重点实验室(江南大学),江苏无锡214122

出  处:《小型微型计算机系统》2024年第10期2420-2427,共8页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61972182)资助.

摘  要:短文本聚类的目的是根据表示空间中的距离来发现数据的语义类别.针对传统文本表示模型面对短文本时会造成特征高维稀疏的问题,以及基于Bert的多特征短文本聚类研究较少的问题,本文研究了一种基于Bert的双特征短文本聚类模型BCCA.首先利用Bert获取词向量表示;其次,利用CNN网络增强对文本局部特征的提取能力和语境感知自注意力网络增强对全局特征提取的能力.最后,为进一步提升聚类效果,将文本表示模块与聚类模块进行联合训练,同时优化文本表示和聚类.为了验证模型性能,在3个数据集上进行实验,实验结果表明,本文提出的模型在数据集SearchSnippets上准确率达到82.8%.The purpose of short text clustering is to discover the semantic classes of data based on the distance in the representation space.In order to address the problem of high-dimensional sparsity of features caused by traditional text representation models for short texts and the problem of less research on Bert-based multi-feature short text clustering,this paper investigates a Bert-based dual-feature short text clustering model BCCA.firstly,Bert is used to obtain word vector representations;secondly,CNN networks are used to enhance the extraction of local features and context-aware self-referencing.attention network to enhance the ability of global feature extraction.Finally,to further enhance the clustering effect,the text representation module is jointly trained with the clustering module to optimize both text representation and clustering.In order to verify the model performance,experiments are conducted on three datasets,and the experimental results show that the proposed model achieves 82.8%accuracy on the dataset SearchSnippets.

关 键 词:短文本聚类 双特征 语境感知 Bert CNN 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象