面向高维不平衡数据的特征选择算法被引量：1

Feature Selection Algorithm for High Dimensional Unbalanced Data

作　　者：王振飞[1] 袁佩瑶曹中亚张利莹 WANG Zhenfei;YUAN Peiyao;CAO Zhongya;ZHANG Liying(School of Computing and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China)

机构地区：[1]郑州大学计算机与人工智能学院,郑州450001

出　　处：《小型微型计算机系统》2024年第8期1839-1846,共8页Journal of Chinese Computer Systems

基　　金：国家自然科学基金项目(62276238)资助.

摘　　要：针对传统高维不平衡数据集的分类算法存在偏向多数类、忽视少数类等问题,本文提出一种基于密度聚类和重要性度量的特征选择算法(DBIM).首先通过随机降采样的方法构造出多个平衡子集,使用DBSCAN密度聚类方法作为基分类器生成初始特征子空间.然后按照重要度对特征进行排序选择出较强分类的特征.最后,为了避免特征之间的冗余性,设计基于类分布的权重指标与冗余性评价指标相结合的方法进行计算,生成高质量的特征子集.在8个公开数据集上的实验结果表明,本文提出DBIM算法可以生成高相关度且低冗余度的特征子集,对高维不平衡数据集进行有效降维,提高分类性能.To solve the problem thatthat the traditional classification algorithms of high-dimensional unbalanced datasets tend to the majority class and ignore the minority class,a feature selection algorithm based on density clustering and importance measurement is proposed in this paper.DBIM first constructs multiple balanced subsets by random down-sampling method,and uses DBSCAN density clustering method as the base classifier to generate initial feature subspace.Then the features are sorted according to their importance to select the features with strong classification.Finally,to avoid the redundancy among features,DBIM designs a new weight index based on class distribution for calculation combined with the redundancy evaluation index to generate high-quality feature subsets.Experimental results on eight publicly available datasets show that the DBIM algorithm proposed in this paper can generate feature subsets with high relevance and low redundancy,effectively reduce the dimensionality of high-dimensional imbalanced datasets and improve the classification performance.

关键词：高维不平衡数据集密度聚类特征选择相关性冗余性

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向高维不平衡数据的特征选择算法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向高维不平衡数据的特征选择算法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

面向高维不平衡数据的特征选择算法被引量：1