基于分布式存储和并行计算的海量舆情数据分析方法研究被引量：1

Research on massive public opinion data analysis method based on distributed storage and parallel computing

作　　者：邱国婷 QIU Guoting(Xi’an Aeronautical Polytechnic Institute,Xi’an 710089,China)

出　　处：《电子设计工程》2023年第20期82-85,90,共5页Electronic Design Engineering

基　　金：陕西省教育科学“十三五”规划课题(SGH20Y1637)。

摘　　要：针对传统集中式数据分析方法难以适用于海量数据处理的问题,提出了一种基于分布式存储和并行计算的海量舆情数据分析方法。在构建完成的数据分析系统中,将采集的源数据存储在Hadoop分布式文件系统,并采用基于热点检测的缓存机制进行数据读写。同时通过Spark进行数据查询,利用随机森林算法完成数据的高精度分析,且系统的数据分析结果会以各种形式显示并支持查询。在Hadoop 2.6.0与Spark 1.5.0平台上对所提方法进行的实验分析表明,30 000条记录的响应时间是7.8 s,分析准确率为96%,均优于其他对比方法,故具有一定的应用价值。Aiming at the problem that the traditional centralized data analysis method is not suitable for massive data processing,a massive public opinion data analysis method based on distributed storage and parallel computing is proposed.In the constructed data analysis system,the collected source data is stored in Hadoop distributed file system,in which the cache mechanism based on hotspot detection is used for data reading and writing.The data is queried through Spark,and the random forest algorithm is used to complete the high⁃precision analysis of the data.At the same time,the data analysis results of the system will be displayed in various forms and support the query.The experimental analysis of the proposed method on Hadoop 2.6.0 and Spark 1.5.0 platforms shows that the response time of 30000 records is 7.8 seconds,and the analysis accuracy is 96%,both of which are superior to other comparison methods.Therefore,it has certain application value.

关键词：分布式存储并行计算舆情数据 HADOOP分布式文件系统 SPARK 随机森林算法

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于分布式存储和并行计算的海量舆情数据分析方法研究被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于分布式存储和并行计算的海量舆情数据分析方法研究 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于分布式存储和并行计算的海量舆情数据分析方法研究被引量：1