融合知识图谱的文本聚类方法研究被引量：2

Research on text clustering method based on knowledge graph fusion

作　　者：龚芝马凌刘敏何先波[3] Gong Zhi;Ma Ling;Liu Min;He Xianbo(School of Computer Science and Engineering,Hunan University Information Technology,Changsha 410151,China;School of Information,Southwest Petroleum University,Chengdu 637001,China;School of Computer Science,China West Normal University,Nanchong 637002,China)

机构地区：[1]湖南信息学院计算机科学与工程学院,湖南长沙410151 [2]西南石油大学信息学院,四川成都637001 [3]西华师范大学计算机学院,四川南充637002

出　　处：《南京理工大学学报》2022年第2期170-176,共7页Journal of Nanjing University of Science and Technology

基　　金：湖南省教育厅科学研究项目优秀青年项目(19B397);四川省自然科学基金(2018GFW0151)。

摘　　要：为了提高文本聚类的性能,采用近邻传播(Affinity propagation,AP)算法进行文本聚类,并采用知识图谱进行样本预分析,以提高AP的文本聚类适用度。采用知识图谱进行样本预处理,对待聚类的文本进行知识图谱三元分析,并生成对应概念、实体和关系的样本集合;建立AP文本聚类模型,并通过差分进化(Differential evolution,DE)算法优化偏向参数;利用DE算法求解的最优个体的偏向参数进行AP聚类运算,不断更新AP算法的决策和潜力阵,从而获得稳定的聚类结果。试验结果表明,经过知识图谱分析之后,通过合理设置DE算法的差分缩放因子和交叉速率,DE-AP算法能够获得更优的聚类准确度,且聚类准确度的均方根误差(Root mean squared error,RMSE)值更低;和常用文本聚类算法相比,该文算法获得了更高的聚类准确度。In order to improve the performance of text clustering,the affinity propagation(AP)algorithm is used for text clustering,and the knowledge graph is used for sample preanalysis to improve the applicability of AP text clustering.Firstly,the knowledge graph is used for sample preprocessing.The text to be clustered is analyzed by knowledge graph ternary analysis,and the sample set of corresponding concepts,entities and relationships is generated;Then,the AP text clustering model is established,and bias parameters are optimized by differential evolution(DE)algorithm.Finally,the AP clustering operation is carried out through the bias parameters of the optimal individual solved by DE algorithm,and the decision and potential matrix of AP algorithm are constantly updated to obtain stable clustering results.The experimental results show that,after knowledge graph analysis,by properly setting the differential scaling factor and crossover rate of DE algorithm,the DE-AP algorithm can obtain better clustering accuracy,and the root mean squared error(RMSE)value of the clustering accuracy is lower.Compared with the common text clustering algorithms,this algorithm obtains a higher clustering accuracy.

关键词：文本聚类近邻传播算法知识图谱差分进化偏向参数

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合知识图谱的文本聚类方法研究被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合知识图谱的文本聚类方法研究 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

融合知识图谱的文本聚类方法研究被引量：2