基于多准则决策的不平衡感知数据集成特征选择算法  

Imbalance-Aware Data Based on Multi-Criteria Decision Making Integrated Feature Selection Algorithm

在线阅读下载全文

作  者:王刚 任丽萍 方力 徐维磊 

机构地区:[1]南京航空航天大学自动化学院,江苏 南京 [2]南通思振电子科技有限公司,江苏 南通

出  处:《传感器技术与应用》2023年第6期538-549,共12页Journal of Sensor Technology and Application

摘  要:在数据挖掘领域,不平衡数据普遍存在。在许多情况下,这些数据通常具有高维性和类不平衡性。不平衡数据集特征属性分布失衡,会造成分类性能下降,数据的高维性则会导致学习算法非常耗时。针对这一问题,提出了一种基于组合采样和集成学习的特征选择方法。首先使用组合采样方法,处理类不平衡问题,重点合成少数类样本,在保证数据集达到平衡的前提下去除噪声样本,将集成特征选择建模为一个多准则决策过程,使用VIKOR方法得到特征重要性排序,然后在序列前向搜索特征的过程中,使用XGBoost算法的准确率作为评估特征子集优劣的指标,确定最优特征子集。选择AUC、G-mean和F-measure作为评价指标,通过在5组不平衡数据集进行实验,证实了所提算法具有更好的分类效果,且模型的鲁棒性更好。In the field of data mining, unbalanced data are prevalent. In many cases, these data are usually of high dimensionality and class imbalance. An unbalanced distribution of feature attributes in unbalanced datasets can cause degradation of classification performance, while the high dimensionality of the data can lead to very time-consuming learning algorithms. To address this problem, a feature selection method based on combinatorial sampling and integrated learning is proposed. Firstly, we use the combined sampling method to deal with the class imbalance problem, focus on synthesizing a few class samples, and remove the noise samples under the premise of ensuring that the dataset is balanced, model the integrated feature selection as a multi-criteria decision-making process, and use the VIKOR method to get the feature importance ranking, and then in the process of sequential forward searching for the features, we use the accuracy of the XGBoost algorithm as an indicator of the assessment of the feature subset’s The optimal feature subset is determined by using the index of superiority and inferiority. AUC, G-mean, and Fmeasure are chosen as the evaluation indexes, and the proposed algorithm is confirmed to have a better classification effect and better robustness of the model through the experiments in five unbalanced datasets.

关 键 词:不平衡数据分类 组合采样 多准则决策 VIKOR法 前向序列选择 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象