基于Spark的实时数据采集与处理  被引量:5

Real time data acquisition and processing based on Spark

在线阅读下载全文

作  者:黄涛 高丽婷[1] HUANG Tao;GAO Li-ting(Hebei University of Architecture,Zhangjiakou,075000)

机构地区:[1]河北建筑工程学院,河北张家口075000

出  处:《河北建筑工程学院学报》2022年第4期176-179,188,共5页Journal of Hebei Institute of Architecture and Civil Engineering

摘  要:社交平台与网络的飞速发展导致了数据量越来越大,使得实时数据量也成几何增长。实时数据分析越来越重要,已有的实时数据分析系统存在运算能力不足等问题。基于此,提出一种基于Spark的实时数据采集与处理方法。工作在分布式环境下的Spark具有处理大数据量能力,弥补了运算能力不足的问题。结合Flume,Kafka可以聚合多种数据源的特点,即使是不同的数据源Spark也能实时得到监控的数据流,调用Spark streaming模块对数据流实时处理并且可以将处理后的数据转存到其他处理组件或者数据库。实验结果表明,本方法可以对日志文件实时监控与分析并转存,有效的解决了实时数据的处理问题。The rapid development of social platforms and networks has led to an increasing amount of data,making the amount of real-time data grow geometrically.Real time data analysis is becoming more and more important.The existing real-time data analysis systems have some problems,such as insufficient computing power and so on.Based on this,a real-time data acquisition and processing method based on spark is proposed.Spark,which works in a distributed environment,has the ability to process large amounts of data,making up for the lack of computing power.Combined with flume,Kafka can aggregate the characteristics of multiple data sources.Even for different data sources,spark can get the monitored data stream in real time.The Spark flow module is used to process data flows in real time.The processed data can be transferred to other processing components or databases.Experimental results show that this method can monitor,analyze and store log files in real time,and effectively solve the problem of real time data processing.

关 键 词:分布式处理 实时数据流 日志文件 实时监控 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象