基于Spark的实时数据采集与处理被引量：5

Real time data acquisition and processing based on Spark

作　　者：黄涛高丽婷[1] HUANG Tao;GAO Li-ting(Hebei University of Architecture,Zhangjiakou,075000)

出　　处：《河北建筑工程学院学报》2022年第4期176-179,188,共5页Journal of Hebei Institute of Architecture and Civil Engineering

摘　　要：社交平台与网络的飞速发展导致了数据量越来越大,使得实时数据量也成几何增长。实时数据分析越来越重要,已有的实时数据分析系统存在运算能力不足等问题。基于此,提出一种基于Spark的实时数据采集与处理方法。工作在分布式环境下的Spark具有处理大数据量能力,弥补了运算能力不足的问题。结合Flume,Kafka可以聚合多种数据源的特点,即使是不同的数据源Spark也能实时得到监控的数据流,调用Spark streaming模块对数据流实时处理并且可以将处理后的数据转存到其他处理组件或者数据库。实验结果表明,本方法可以对日志文件实时监控与分析并转存,有效的解决了实时数据的处理问题。The rapid development of social platforms and networks has led to an increasing amount of data,making the amount of real-time data grow geometrically.Real time data analysis is becoming more and more important.The existing real-time data analysis systems have some problems,such as insufficient computing power and so on.Based on this,a real-time data acquisition and processing method based on spark is proposed.Spark,which works in a distributed environment,has the ability to process large amounts of data,making up for the lack of computing power.Combined with flume,Kafka can aggregate the characteristics of multiple data sources.Even for different data sources,spark can get the monitored data stream in real time.The Spark flow module is used to process data flows in real time.The processed data can be transferred to other processing components or databases.Experimental results show that this method can monitor,analyze and store log files in real time,and effectively solve the problem of real time data processing.

关键词：分布式处理实时数据流日志文件实时监控

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Spark的实时数据采集与处理被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Spark的实时数据采集与处理 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Spark的实时数据采集与处理被引量：5