一种改进的特征子集区分度评价准则  被引量:1

An Improved Criterion for Evaluating the Discernibility of a Feature Subset

在线阅读下载全文

作  者:谢娟英[1] 吴肇中 郑清泉 王明钊[1,2] XIE Juan-Ying;WU Zhao-Zhong;ZHENG Qing-Quan;WANG Ming-Zhao(School of Computer Science,Shaanxi Normal University,Xi'an 710119;College of Life Sciences,Shaanxi Normal University,Xi'an 710119)

机构地区:[1]陕西师范大学计算机科学学院,西安710119 [2]陕西师范大学生命科学学院,西安710119

出  处:《自动化学报》2022年第5期1292-1306,共15页Acta Automatica Sinica

基  金:国家自然科学基金(62076159,12031010,61673251);中央高校基本科研业务费(GK202105003)资助。

摘  要:针对特征子集区分度准则(Discernibility of feature subsets,DFS)没有考虑特征测量量纲对特征子集区分能力影响的缺陷,引入离散系数,提出GDFS(Generalized discernibility of feature subsets)特征子集区分度准则.结合顺序前向、顺序后向、顺序前向浮动和顺序后向浮动4种搜索策略,以极限学习机为分类器,得到4种混合特征选择算法.UCI数据集与基因数据集的实验测试,以及与DFS、Relief、DRJMIM、mRMR、LLE Score、AVC、SVM-RFE、VMInaive、AMID、AMID-DWSFS、CFR和FSSC-SD的实验比较和统计重要度检测表明:提出的GDFS优于DFS,能选择到分类能力更好的特征子集.To overcome the deficiencies of the discernibility of feature subsets(DFS)which cannot take into account the influences from different attribute scales on the discernibility of a feature subset,the generalized DFS,shorted as GDFS,is proposed in this paper by introducing the coefficient of variation.The GDFS is combined with four search strategies,including sequential forward search(SFS),sequential backward search(SBS),sequential forward floating search(SFFS)and sequential backward floating search(SBFS)to develop four hybrid feature selection algorithms.The extreme learning machine(ELM)is adopted as a classification tool to guide feature selection process.We test the classification capability of the feature subsets detected by GDFS on the datasets from UCI machine learning repository and on the classic gene expression datasets,and compare the performance of the ELM classifiers based on the feature subsets by GDFS,DFS and classic feature selection algorithms including Relief,DRJMIM,mRMR,LLE Score,AVC,SVM-RFE,VMInaive,AMID,AMID-DWSFS,CFR,and FSSC-SD respectively.The statistical significance test is also conducted between GDFS,DFS,Relief,DRJMIM,mRMR,LLE Score,AVC,SVM-RFE,VMInaive,AMID,AMID-DWSFS,CFR,and FSSC-SD.Experimental results demonstrate that the proposed GDFS is superior to the original DFS.It can detect the feature subsets with much better capability in classification performance.

关 键 词:特征子集区分度 特征选择 离散系数 极限学习机 特征搜索策略 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象