检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王红林[1] 李忠伟 WANG Hong-lin;Li Zhong-wei(School of Artificial Intelligence/School of Future Technology,Nanjing University of Information Science and Technology,Nanjing Jiangsu 210044,China)
机构地区:[1]南京信息工程大学人工智能学院(未来技术学院),江苏南京210044
出 处:《计算机仿真》2024年第3期352-358,共7页Computer Simulation
基 金:国家自然科学基金委员会青年项目(62101274)。
摘 要:因传统文本数据挖掘算法在大数据场景下的文本聚类挖掘效果较差,提出一种大数据场景下基于文本数据挖掘的用户评论聚类算法。首先,通过设计改进的信息增益算法提取用户评论数据特征,根据信息熵提取文本关键字和不平衡数据项形成特征数据。之后,使用改进的聚类数据挖掘算法对特征数据进行聚类挖掘。最后,基于Spark框架将改进的聚类数据挖掘算法进行并行化改造。设计实验验证分析所提特征提取算法与聚类挖掘算法的性能,结果表明在大数据场景下所提算法的运行时间、准确率和加速比方面优于传统算法。Traditional text data mining algorithms are less effective in text clustering mining under big data scenarios,so a user comment clustering algorithm based on text data mining under big data scenarios is proposed in the paper.Firstly,user comment data features were extracted by designing an improved information gain algorithm,and feature data were formed by extracting text keywords and imbalanced data items according to information entropy.After that,the feature data were clustered and mined using the improved clustering data mining algorithm.Finally,the improved clustering data mining algorithm was parallelized based on Spark framework.Experiments were designed to verify and analyze the performance of the proposed feature extraction algorithm and the clustering mining algorithm.The results show that the proposed algorithm outperforms the traditional algorithm in terms of running time,accuracy and speedup ratio in the big data scenario.
分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171