基于文本挖掘的多用户投诉数据流聚类算法被引量：7

Multi-User Complaint Data Stream Clustering Algorithm Based on Text Mining

作　　者：陈谦[1] 徐兴梅[1] 陈帅 CHEN Qian;XU Xing-mei;CHEN Shuai(College of Information Technology,Jilin Agricultural University,Jilin Changchun 130118,China)

机构地区：[1]吉林农业大学信息技术学院,吉林长春130118

出　　处：《计算机仿真》2022年第5期423-426,498,共5页Computer Simulation

基　　金：吉林省科技发展计划项目(20200403176SF,201804 18014FG);吉林省教育厅科研项目(JJKH20190925KJ)。

摘　　要：传统数据聚类算法缺乏对文本信息的挖掘,造成聚类效果较差,因此提出一种基于文本挖掘的多用户投诉数据流聚类算法。依据文本挖掘技术原理,选择支持向量机作为文本聚类模型,在算法设计中,首先提取多用户投诉数据文本特征,根据关键字权值和特征项总数,将高维度向量空间降维,删除无关紧要的特征项。使用综合度量法,计算Euclid距离、赫尔曼距离以及正弦相似度得到文本之间相似性,最后优化数据流聚类算法聚类流程,完成聚类算法的设计。设计实验测试所提聚类算法和传统算法的聚类性能,结果表明所提聚类算法的F1值较高,聚类性能优于传统算法。The poor clustering effect is caused by the lack of text information mining.Therefore,this paper puts forward a multi-user complaint data stream clustering algorithm based on text mining.The principle of text mining technology was introduced to select a support vector machine as the text clustering model.Firstly,the text features of multi-user complaint data were extracted.Secondly,according to the keyword weight and the total number of feature items,the dimension of high-dimensional vector space was reduced for deleting redundant feature items.Then,the comprehensive measurement method was adopted to calculate Euclid distance,Herman distance and sinusoidal similarity,thus obtaining the similarity between texts.Eventually,the data stream clustering algorithm and the clustering process were optimized to complete the design of the clustering algorithm.The experiments were designed to test the clustering performance of the algorithm and the traditional algorithm.The results show that the clustering performance of the algorithm is better than that of the traditional algorithm,owing to the higher F1 value.

关键词：文本挖掘数据流聚类算法支持向量机关键字权值

分类号：TP391.9[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于文本挖掘的多用户投诉数据流聚类算法被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于文本挖掘的多用户投诉数据流聚类算法 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于文本挖掘的多用户投诉数据流聚类算法被引量：7