基于动态平衡自适应迁移学习的流量分类方法  被引量:3

Traffic Classification Method Based on Dynamic Balance Adaptive Transfer Learning

在线阅读下载全文

作  者:尚凤军[1] 李赛赛 王颖[1] 催云帆 SHANG Fengjun;LI Saisai;WANG Ying;CUI Yunfan(School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400000,China)

机构地区:[1]重庆邮电大学计算机科学与技术学院,重庆400000

出  处:《电子与信息学报》2022年第9期3308-3319,共12页Journal of Electronics & Information Technology

基  金:国家自然科学基金(61672004)。

摘  要:针对应用流量识别性能和准确率降低等问题,该文提出一种动态平衡自适应迁移学习的流量分类算法。首先将迁移学习引入到应用流量识别中,通过将源领域和目标领域的样本特征映射到高维特征空间中,使得源领域和目标领域的边缘分布与条件分布距离尽量小,提出使用概率模型来判断和计算域之间的边缘分布与条件分布的区别,利用概率模型对分类类别确认度的大小,定量来计算平衡因子μ,解决DDA中只考虑到分类错误率,没有考虑到确认度的问题。然后引入断崖式下跌策略动态确定特征主元的数量,将进行转换后的特征使用基础分类器进行训练,通过不断的迭代训练,将最终得到的分类器应用到最新的移动终端应用识别上,比传统机器学习方法的准确率平均提高了7%左右。最后针对特征维度较高的问题,引入逆向特征自删除策略,结合推土机距离(EMD),使用信息增益权重推土机相关系数,提出了针对应用流量识别的特征选择算法,解决了部分特征对模型的分类无法起到任何的帮助,仅仅导致模型的训练时间增加,甚至由于无关特征的存在导致模型的性能和准确率降低等问题,将经过选择处理的特征集作为迁移学习的训练输入数据,使得迁移算法的时间缩短大约80%。In this paper,an improved and adaptive transfer learning algorithm is proposed for mobile application traffic recognition filed,which maps the sample features of source domain and target domain into high-dimensional feature space to minimize the marginal distribution and conditional distribution distances of the domains.A probabilistic model is presented to judge and calculate the difference between marginal distribution and conditional distribution between the domains.It can determine the degree of classification category and calculate quantitively the balance factorμ,which solves effectively the problem that DDA only considers classification error rate and ignores the degree of confirmation.Besides,the cliff-type down strategy is introduced to determine dynamically the number of feature principal.Compared with the traditional machine learning method,the proposed algorithm improves the accuracy by about 7%.Moreover,a feature selection algorithm for application traffic recognition is proposed to solve the problem of high feature dimension,where the reverse feature self-deleting strategy combined with the Earth Mover’s Distance(EMD)and used the correlation coefficient of the bulldozer to weight the information gain is introduced.It solves the problems that increasing the training time of model due to invalid features and decreasing the model performance and accuracy caused by irrelevant features.Simulation result shows that when the training input data for transfer learning uses the feature set processed by the proposed algorithm,the time of the transfer algorithm can be shortened by about 80%.

关 键 词:迁移学习 应用识别 特征提取 跨领域 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象