检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]国家数字交换系统工程技术研究中心,郑州450002
出 处:《计算机应用》2018年第1期20-25,共6页journal of Computer Applications
基 金:国家科技重大专项(2016ZX01012101);国家自然科学基金面上项目(61572520);国家自然科学基金创新群体项目(61521003).
摘 要:针对网络中存在的对等网络(P2P)流量泛滥导致的流量失衡问题,提出将非平衡数据分类思想应用于流量识别过程。通过引入合成少数类过采样技术(SMOTE)算法并进行改进,提出了均值SMOTE(M-SMOTE)算法,实现对流量数据的平衡化处理。在此基础上分别采用3种机器学习分类器:随机森林(RF)、支持向量机(SVM)、反向传播神经网络(BPNN)对处理后各类流量进行识别。理论分析与仿真结果表明,在不影响P2P流量识别准确率的前提下,与非平衡状态相比,引入SMOTE算法将非P2P流量的识别准确率平均提高了16.5个百分点,将网络流量的整体识别率提高了9.5个百分点;与SMOTE算法相比,M-SMOTE算法将非P2P流量的识别准确率与网络流量的整体识别率分别进一步提高了3.2个百分点和2.6个百分点。实验结果表明,非平衡数据分类思想可有效解决P2P流量过多导致的非P2P流量识别率低的问题,同时所提M-SMOTE算法具有更高的识别准确度。To solve the problem existing in traffic classification that Peer-to-Peer (P2P) traffic is much more than that of non-P2P, a new traffic classification method for imbalanced network data was presented. By introducing and improving Synthetic Minority Over-sampling Technique (SMOTE) algorithm, a Mean SMOTE (M-SMOTE) algorithm was proposed to realize the balance of traffic data. On the basis of this, throe kinds of machine learning classifiers: Random Forest (RF), Support Vector Machine (SVM), Back Propagation Neural Network (BPNN) were used to identify the various types of traffic. The theoretical analysis and simulation results show that, compared with the imbalanced state, the SMOTE algorithm improves the recognition accuracy of non-P2P traffic by 16.5 percentage points and raises the overall recognition rate of network traffic by 9.5 percentage points. Compared with SMOTE algorithm, the M-SMOTE algorithm further improves the recognition rate of non-P2P traffic and the overall recognition rate of network traffic by 3.2 percentage points and 2. 6 percentage points respectively. The experimental results show that the way of imbalancod data classification can effectively solve the problem of low P2P traffic recognition rate caused by excessive P2P traffic, and the M-SMOTE algorithm has higher recognition accuracy rate than SMOTE.
关 键 词:非平衡数据 P2P流量 流量识别 机器学习 合成少数类过采样技术算法
分 类 号:TP393.02[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117