检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈谦[1] 徐兴梅[1] 陈帅 CHEN Qian;XU Xing-mei;CHEN Shuai(College of Information Technology,Jilin Agricultural University,Jilin Changchun 130118,China)
机构地区:[1]吉林农业大学信息技术学院,吉林长春130118
出 处:《计算机仿真》2022年第5期423-426,498,共5页Computer Simulation
基 金:吉林省科技发展计划项目(20200403176SF,201804 18014FG);吉林省教育厅科研项目(JJKH20190925KJ)。
摘 要:传统数据聚类算法缺乏对文本信息的挖掘,造成聚类效果较差,因此提出一种基于文本挖掘的多用户投诉数据流聚类算法。依据文本挖掘技术原理,选择支持向量机作为文本聚类模型,在算法设计中,首先提取多用户投诉数据文本特征,根据关键字权值和特征项总数,将高维度向量空间降维,删除无关紧要的特征项。使用综合度量法,计算Euclid距离、赫尔曼距离以及正弦相似度得到文本之间相似性,最后优化数据流聚类算法聚类流程,完成聚类算法的设计。设计实验测试所提聚类算法和传统算法的聚类性能,结果表明所提聚类算法的F1值较高,聚类性能优于传统算法。The poor clustering effect is caused by the lack of text information mining.Therefore,this paper puts forward a multi-user complaint data stream clustering algorithm based on text mining.The principle of text mining technology was introduced to select a support vector machine as the text clustering model.Firstly,the text features of multi-user complaint data were extracted.Secondly,according to the keyword weight and the total number of feature items,the dimension of high-dimensional vector space was reduced for deleting redundant feature items.Then,the comprehensive measurement method was adopted to calculate Euclid distance,Herman distance and sinusoidal similarity,thus obtaining the similarity between texts.Eventually,the data stream clustering algorithm and the clustering process were optimized to complete the design of the clustering algorithm.The experiments were designed to test the clustering performance of the algorithm and the traditional algorithm.The results show that the clustering performance of the algorithm is better than that of the traditional algorithm,owing to the higher F1 value.
关 键 词:文本挖掘 数据流 聚类算法 支持向量机 关键字权值
分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15