基于KL距离的不平衡数据渐进学习算法研究  被引量:1

Research on Progressive Learning Algorithm for Unbalanced Data Based on KL Distance

在线阅读下载全文

作  者:赵向兵 周建慧 杨泽民[1] ZHAO Xiang-bing;ZHOU Jian-hui;YANG Ze-min(School of Computer and Network Engineering,Shanxi Datong University,Shanxi Datong 037009,China)

机构地区:[1]山西大同大学计算机与网络工程学院,山西大同037009

出  处:《计算机仿真》2021年第12期291-294,共4页Computer Simulation

摘  要:为解决不平衡数据内在固有性引发的分类结果受强势类影响较大的问题,研究基于KL距离的不平衡数据渐进学习算法,精准地辨识出不平衡数据中的弱势类样本,提高算法分类性能。在深入分析KL距离和欠抽样法的基础上,使用欠抽样法平衡化处理不平衡数据集,采用基于KL距离的不平衡数据半监督学习算法,以渐进模式,通过寻找可靠正例、可靠反例,实现处理后数据集的最终分类。实验结果表明,上述算法的G-mean值始终较高,可极大地提高算法分类性能;所提算法使用后的F-measure值在任何抽样比例下都高于使用前的F-measure值,且在抽样比例较大时,F-measure值呈缓慢上升趋势,能很好地分类出不平衡数据集中的弱势类样本。Strong classes affect the classification results caused by the inherent nature of unbalanced data.The incremental learning algorithm of unbalanced data based on KL distance was studied in the paper,for accurately identifying the vulnerable samples in unbalanced data and improving the classification performance of the algorithm.KLdistance and under sampling method were deeply analyzed.The under-sampling method was applied to balance unbalanced data sets.A semi-supervised learning algorithm for unbalanced data based on KL distance was introduced.In a progressive mode,the processed data sets were classified according to finding reliable positive examples and reliable counterexamples.The experimental results show that the algorithm has excellent classification performance(highg-mean value),and the F-measure value after use is higher than that before use.When the sampling proportion islarge,the F-measure value increases slowly,effectively classifying vulnerable samples in unbalanced data sets.

关 键 词:不平衡数据 渐进学习算法 弱势类 欠抽样法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象