基于信息熵的自适应网络流概念漂移分类方法  被引量:14

An Adaptive Classification Approach Based on Information Entropy for Network Traffic in Presence of Concept Drift

在线阅读下载全文

作  者:潘吴斌[1,2] 程光[1,2] 郭晓军[1,2] 黄顺翔 

机构地区:[1]东南大学计算机科学与工程学院,南京210096 [2]计算机网络和信息集成教育部重点实验室(东南大学),南京210096

出  处:《计算机学报》2017年第7期1556-1571,共16页Chinese Journal of Computers

基  金:国家"八六三"高技术研究发展计划项目基金(2015AA015603);江苏省未来网络创新研究院未来网络前瞻性研究项目(BY2013095-5-03);江苏省"六大人才高峰"高层次人才项目(2011-DZ024);中央高校基本科研业务费专项资金;江苏省普通高校研究生科研创新计划资助项目(KYLX15_0118)资助~~

摘  要:由于网络流量特征随时间和网络环境的变化而发生改变,导致基于机器学习的流量分类方法精度明显降低.同时,根据经验定期更新分类器是耗时的,且难以保证新分类器泛化性能.因而,文中提出一种基于信息熵的自适应网络流概念漂移分类方法,首先根据特征属性的信息熵变化检测概念漂移,再采用增量集成学习策略在概念漂移点引入当前流量建立的分类器,并剔除性能下降的分类器,达到更新分类器的目的,最后加权集成分类结果.实验结果表明该方法可以有效地检测概念漂移并更新分类器,表现出较好的分类性能和泛化能力.In recent years, traffic classification based on machine learning shows a high accuracy. Nevertheless, machine learning-based traffic classification heavily depends on the environment where the samples are trained. In practice, although a classifier can be accurately trained at a given network environment, its accuracy will see a great decline when it faces to classify traffic from varying network condition in practice. Due to dynamic changes of traffic statistics and distribution, the machine learning-based classifiers should be updated periodically in order to optimize the performance. This issue is unavoidable for machine learning-based traffic classification. The present solutions lack explicit recommendations on when a classifier should be updated and how to effectively update the classifier. These result in several shortcomings. (1) Updating a traditional traffic classifier is time consuming. It is inherent to how often a classifier should be updated or when a new classifier will be needed. (2) Updating only a new classifier on new traffic leads to some learned knowledge lost. It further affects the performance when updating a classifier on a large dataset that combines all collected data. (3) Traffic statistics and distribution from varying network condition are dynamically changed. Thus, it is hard to obtain stable feature subset to build robust classifier. Therefore, building an adaptive classifier to changing network condition isa huge challenge. In this paper, we develop an adaptive traffic classification using entropy-based detection and incremental ensemble learning, assisted with embedded feature selection. In order to update the classifier timely and effectively, the entropy-based detection utilizes sliding window technique to measure the statistical difference between the previous and current traffic samples by counting and comparing all instances with respect to their feature stream membership. Additionally, we discretize the range of feature values to a fixed number of bins to take t

关 键 词:概念漂移 机器学习 信息熵检测 增量集成学习 流量分类 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象