检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:潘吴斌[1,2] 程光[1,2] 郭晓军[1,2] 黄顺翔
机构地区:[1]东南大学计算机科学与工程学院,南京210096 [2]计算机网络和信息集成教育部重点实验室(东南大学),南京210096
出 处:《计算机学报》2017年第7期1556-1571,共16页Chinese Journal of Computers
基 金:国家"八六三"高技术研究发展计划项目基金(2015AA015603);江苏省未来网络创新研究院未来网络前瞻性研究项目(BY2013095-5-03);江苏省"六大人才高峰"高层次人才项目(2011-DZ024);中央高校基本科研业务费专项资金;江苏省普通高校研究生科研创新计划资助项目(KYLX15_0118)资助~~
摘 要:由于网络流量特征随时间和网络环境的变化而发生改变,导致基于机器学习的流量分类方法精度明显降低.同时,根据经验定期更新分类器是耗时的,且难以保证新分类器泛化性能.因而,文中提出一种基于信息熵的自适应网络流概念漂移分类方法,首先根据特征属性的信息熵变化检测概念漂移,再采用增量集成学习策略在概念漂移点引入当前流量建立的分类器,并剔除性能下降的分类器,达到更新分类器的目的,最后加权集成分类结果.实验结果表明该方法可以有效地检测概念漂移并更新分类器,表现出较好的分类性能和泛化能力.In recent years, traffic classification based on machine learning shows a high accuracy. Nevertheless, machine learning-based traffic classification heavily depends on the environment where the samples are trained. In practice, although a classifier can be accurately trained at a given network environment, its accuracy will see a great decline when it faces to classify traffic from varying network condition in practice. Due to dynamic changes of traffic statistics and distribution, the machine learning-based classifiers should be updated periodically in order to optimize the performance. This issue is unavoidable for machine learning-based traffic classification. The present solutions lack explicit recommendations on when a classifier should be updated and how to effectively update the classifier. These result in several shortcomings. (1) Updating a traditional traffic classifier is time consuming. It is inherent to how often a classifier should be updated or when a new classifier will be needed. (2) Updating only a new classifier on new traffic leads to some learned knowledge lost. It further affects the performance when updating a classifier on a large dataset that combines all collected data. (3) Traffic statistics and distribution from varying network condition are dynamically changed. Thus, it is hard to obtain stable feature subset to build robust classifier. Therefore, building an adaptive classifier to changing network condition isa huge challenge. In this paper, we develop an adaptive traffic classification using entropy-based detection and incremental ensemble learning, assisted with embedded feature selection. In order to update the classifier timely and effectively, the entropy-based detection utilizes sliding window technique to measure the statistical difference between the previous and current traffic samples by counting and comparing all instances with respect to their feature stream membership. Additionally, we discretize the range of feature values to a fixed number of bins to take t
关 键 词:概念漂移 机器学习 信息熵检测 增量集成学习 流量分类
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145