基于Spark的大规模网络流量准实时分类方法  被引量:5

Quasi-realtime Classification Method for Large-Scale Network Traffic Based on Spark

在线阅读下载全文

作  者:杨晨光[1,2] 马永征[2] 

机构地区:[1]中国科学院大学,北京100049 [2]中国科学院计算机网络信息中心,北京100190

出  处:《科研信息化技术与应用》2016年第2期25-34,共10页E-science Technology & Application

摘  要:大数据时代催生了互联网流量的指数级增长,为了有效地管控网络资源,提高网络安全性,需要对网络流量进行快速、准确的分类,这就对流量分类技术的实时性提出了更高的要求。目前,国内外的网络流量分类研究大多是在单机环境下进行的,计算资源有限,难以应对高速网络中的(准)实时流量分类任务。本文在充分借鉴已有研究成果的基础上,吸收当前最新的思想和技术,基于Spark平台,有机结合其流处理框架Spark Streaming与机器学习算法库MLlib,提出一种大规模网络流量准实时分类方法。实验结果表明,该方法在保证高分类准确率的同时,也具有很好的实时分类能力,可以满足实际网络中流量分类任务的实时性需求。In big data era, the internet traffic presents an exponential growth. In order to effectively control network resources and improve network security, internet traffic should be classified quickly and accurately, which leads to a higher requirement for real time performance of the traffic classification technology. At present, the classification of network traffic were carried out in the stand-alone environment for most of researches, so the computing resources were too limited to respond to real-time or quasi-realtime classification of internet traffic in the high-speed network. In this paper, with reference to the existing research results and the latest theories and technologies, based on the Spark platform, combining the flow processing framework Spark Streaming with machine learning algorithm library MLlib, a quasi-realtime classification method of large scale network traffic was proposed. The experimental result showed that the proposed method guarantees high classification accuracy, and it has a good capacity of real-time classification, which meets the real-time requirements of the traffic classification in real network.

关 键 词:SPARK 流量分类 大规模 准实时 机器学习 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象