检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张桥男 刘渊 ZHANG Qiaonan;LIU Yuan(School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi 214122,China;Jiangsu Key Laboratory of Media Design and Software Technology(Jiangnan University),Wuxi 214122,China)
机构地区:[1]江南大学人工智能与计算机学院,江苏无锡214122 [2]江苏省媒体设计与软件技术重点实验室(江南大学),江苏无锡214122
出 处:《小型微型计算机系统》2024年第10期2420-2427,共8页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61972182)资助.
摘 要:短文本聚类的目的是根据表示空间中的距离来发现数据的语义类别.针对传统文本表示模型面对短文本时会造成特征高维稀疏的问题,以及基于Bert的多特征短文本聚类研究较少的问题,本文研究了一种基于Bert的双特征短文本聚类模型BCCA.首先利用Bert获取词向量表示;其次,利用CNN网络增强对文本局部特征的提取能力和语境感知自注意力网络增强对全局特征提取的能力.最后,为进一步提升聚类效果,将文本表示模块与聚类模块进行联合训练,同时优化文本表示和聚类.为了验证模型性能,在3个数据集上进行实验,实验结果表明,本文提出的模型在数据集SearchSnippets上准确率达到82.8%.The purpose of short text clustering is to discover the semantic classes of data based on the distance in the representation space.In order to address the problem of high-dimensional sparsity of features caused by traditional text representation models for short texts and the problem of less research on Bert-based multi-feature short text clustering,this paper investigates a Bert-based dual-feature short text clustering model BCCA.firstly,Bert is used to obtain word vector representations;secondly,CNN networks are used to enhance the extraction of local features and context-aware self-referencing.attention network to enhance the ability of global feature extraction.Finally,to further enhance the clustering effect,the text representation module is jointly trained with the clustering module to optimize both text representation and clustering.In order to verify the model performance,experiments are conducted on three datasets,and the experimental results show that the proposed model achieves 82.8%accuracy on the dataset SearchSnippets.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30