基于Spark的均值漂移算法在网络舆情聚类中的应用  被引量:3

Application of Spark-based Mean Shift Algorithm in Network Public Opinion Clustering

在线阅读下载全文

作  者:张京坤 王怡怡 ZHANG Jing-kun;WANG Yi-yi(Taiji Computer Corporation,China Electronics Technology Group Corporation,Beijing 100020,China;School of Mathematics and Information Science,Shaanxi Normal University,Xi’an 710100,China)

机构地区:[1]中国电子科技集团太极计算机股份有限公司,北京100020 [2]陕西师范大学数学与信息科学学院,陕西西安710100

出  处:《软件导刊》2020年第9期190-195,共6页Software Guide

摘  要:为改善网络舆情态势感知与预警中舆情信息分析不准确的问题,提出基于Spark技术的均值漂移(MS)算法,利用该算法原理分析Spark框架特性,给出该算法在Spark框架中的实现过程,包括舆情信息预处理、特征提取、特征向量模型构建和算法聚类设计。在相同数据集下将MS算法和K-means算法聚类效果进行对比,实验结果显示,K-means算法聚类结果受k值选取的影响,存在聚类结果不准确的问题;基于Spark的Mean Shift算法在没有任何先验条件下舆情聚类效果优于K-means聚类算法,且符合预期期望。In order to improve the inaccurate analysis of network public opinion information early warning a mean shift(MS)algorithm based on Spark technology is proposed. Based on the principle of mean shift algorithm,this paper analyzes the characteristics of spark framework,and gives the realization process of mean shift algorithm in Spark framework,including the preprocessing of public opinion information,feature extraction,construction of feature vector model and design of clustering of mean shift algorithm. The experimental results show that the clustering results of mean shift algorithm and K-means algorithm are compared under the same data set. The clustering results of K-means algorithm are affected by the selection of K value,which leads to the problem of inaccurate clustering results.The Mean Shift algorithm based on Spark is better than the K-means algorithm in public opinion clustering without any prior conditions and it can meet the expectations.

关 键 词:网络舆情 聚类 均值漂移 SPARK K-MEANS 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象