检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张京坤 王怡怡 ZHANG Jing-kun;WANG Yi-yi(Taiji Computer Corporation,China Electronics Technology Group Corporation,Beijing 100020,China;School of Mathematics and Information Science,Shaanxi Normal University,Xi’an 710100,China)
机构地区:[1]中国电子科技集团太极计算机股份有限公司,北京100020 [2]陕西师范大学数学与信息科学学院,陕西西安710100
出 处:《软件导刊》2020年第9期190-195,共6页Software Guide
摘 要:为改善网络舆情态势感知与预警中舆情信息分析不准确的问题,提出基于Spark技术的均值漂移(MS)算法,利用该算法原理分析Spark框架特性,给出该算法在Spark框架中的实现过程,包括舆情信息预处理、特征提取、特征向量模型构建和算法聚类设计。在相同数据集下将MS算法和K-means算法聚类效果进行对比,实验结果显示,K-means算法聚类结果受k值选取的影响,存在聚类结果不准确的问题;基于Spark的Mean Shift算法在没有任何先验条件下舆情聚类效果优于K-means聚类算法,且符合预期期望。In order to improve the inaccurate analysis of network public opinion information early warning a mean shift(MS)algorithm based on Spark technology is proposed. Based on the principle of mean shift algorithm,this paper analyzes the characteristics of spark framework,and gives the realization process of mean shift algorithm in Spark framework,including the preprocessing of public opinion information,feature extraction,construction of feature vector model and design of clustering of mean shift algorithm. The experimental results show that the clustering results of mean shift algorithm and K-means algorithm are compared under the same data set. The clustering results of K-means algorithm are affected by the selection of K value,which leads to the problem of inaccurate clustering results.The Mean Shift algorithm based on Spark is better than the K-means algorithm in public opinion clustering without any prior conditions and it can meet the expectations.
关 键 词:网络舆情 聚类 均值漂移 SPARK K-MEANS
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.169