检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄涛 高丽婷[1] HUANG Tao;GAO Li-ting(Hebei University of Architecture,Zhangjiakou,075000)
出 处:《河北建筑工程学院学报》2022年第4期176-179,188,共5页Journal of Hebei Institute of Architecture and Civil Engineering
摘 要:社交平台与网络的飞速发展导致了数据量越来越大,使得实时数据量也成几何增长。实时数据分析越来越重要,已有的实时数据分析系统存在运算能力不足等问题。基于此,提出一种基于Spark的实时数据采集与处理方法。工作在分布式环境下的Spark具有处理大数据量能力,弥补了运算能力不足的问题。结合Flume,Kafka可以聚合多种数据源的特点,即使是不同的数据源Spark也能实时得到监控的数据流,调用Spark streaming模块对数据流实时处理并且可以将处理后的数据转存到其他处理组件或者数据库。实验结果表明,本方法可以对日志文件实时监控与分析并转存,有效的解决了实时数据的处理问题。The rapid development of social platforms and networks has led to an increasing amount of data,making the amount of real-time data grow geometrically.Real time data analysis is becoming more and more important.The existing real-time data analysis systems have some problems,such as insufficient computing power and so on.Based on this,a real-time data acquisition and processing method based on spark is proposed.Spark,which works in a distributed environment,has the ability to process large amounts of data,making up for the lack of computing power.Combined with flume,Kafka can aggregate the characteristics of multiple data sources.Even for different data sources,spark can get the monitored data stream in real time.The Spark flow module is used to process data flows in real time.The processed data can be transferred to other processing components or databases.Experimental results show that this method can monitor,analyze and store log files in real time,and effectively solve the problem of real time data processing.
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117