大数据场景下用户评论聚类文本挖掘算法  

Text Mining Algorithm for User Comment Clustering in Big Data Scenario

在线阅读下载全文

作  者:王红林[1] 李忠伟 WANG Hong-lin;Li Zhong-wei(School of Artificial Intelligence/School of Future Technology,Nanjing University of Information Science and Technology,Nanjing Jiangsu 210044,China)

机构地区:[1]南京信息工程大学人工智能学院(未来技术学院),江苏南京210044

出  处:《计算机仿真》2024年第3期352-358,共7页Computer Simulation

基  金:国家自然科学基金委员会青年项目(62101274)。

摘  要:因传统文本数据挖掘算法在大数据场景下的文本聚类挖掘效果较差,提出一种大数据场景下基于文本数据挖掘的用户评论聚类算法。首先,通过设计改进的信息增益算法提取用户评论数据特征,根据信息熵提取文本关键字和不平衡数据项形成特征数据。之后,使用改进的聚类数据挖掘算法对特征数据进行聚类挖掘。最后,基于Spark框架将改进的聚类数据挖掘算法进行并行化改造。设计实验验证分析所提特征提取算法与聚类挖掘算法的性能,结果表明在大数据场景下所提算法的运行时间、准确率和加速比方面优于传统算法。Traditional text data mining algorithms are less effective in text clustering mining under big data scenarios,so a user comment clustering algorithm based on text data mining under big data scenarios is proposed in the paper.Firstly,user comment data features were extracted by designing an improved information gain algorithm,and feature data were formed by extracting text keywords and imbalanced data items according to information entropy.After that,the feature data were clustered and mined using the improved clustering data mining algorithm.Finally,the improved clustering data mining algorithm was parallelized based on Spark framework.Experiments were designed to verify and analyze the performance of the proposed feature extraction algorithm and the clustering mining algorithm.The results show that the proposed algorithm outperforms the traditional algorithm in terms of running time,accuracy and speedup ratio in the big data scenario.

关 键 词:大数据 特征提取 聚类挖掘 并行化 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象