基于混合遗传算法与互信息分析的高维小样本特征选择  被引量:6

FEATURE SELECTION FOR HIGH DIMENSIONAL AND SMALL SAMPLE SIZE BASED ON HYBRID GENETIC ALGORITHM AND MUTUAL INFORMATION ANALYSIS

在线阅读下载全文

作  者:姚树春[1] 刘正[1,2] 张强 Yao Shuchun;Liu Zheng;Zhang Qiang(School of Information Engineering,Suzhou Industrial Park Institute of Services Outsourcing,Suzhou 215123,Jiangsu,China;School of Electronic and Information Engineering,Soochow University,Suzhou 215006,Jiangsu,China;Suzhou Maxnet Network Security Technology Co.,Ltd.,Suzhou 215123,Jiangsu,China)

机构地区:[1]苏州工业园区服务外包职业学院信息工程学院,江苏苏州215123 [2]苏州大学电子信息学院,江苏苏州215006 [3]苏州迈科网络安全技术股份有限公司,江苏苏州215123

出  处:《计算机应用与软件》2020年第1期247-255,共9页Computer Applications and Software

基  金:国家自然科学基金项目(61876117);苏州工业园区服务外包职业学院教改项目(JG-201706);江苏高校“青蓝工程”项目

摘  要:针对高维小样本数据特征选择冗余度高和过拟合的问题,提出一种基于混合遗传算法与互信息分析的高维小样本特征选择算法。对互信息理论与特征选择问题进行深入分析,利用互信息消除特征冗余度能力强的优点,推理出基于互信息的目标函数和优化的边界条件;设计混合的遗传算法来充分利用高维小样本数据集不同角度的属性数据,混合遗传算法设立主种群和次种群,在每次迭代中利用次种群的结果引导主种群的演化,从而缓解小样本数据带来的过拟合问题。基于医学数据集的对比实验结果表明,该算法有效地增强了遗传算法的稳定性和鲁棒性,并且实现了较好的特征选择效果。The feature selection for high dimensional and small sample size data has high redundancy and overfitting.In view of this,we propose a feature selection algorithm for high dimensional and small sample size data based on hybrid genetic algorithm and mutual information analysis.This paper deeply analyzed the mutation information theory and feature selection problem.By utilizing the advantages of mutual information in eliminating feature redundancy,we deduced an objective function based on mutual information and an optimized boundary condition.We designed a hybrid genetic algorithm to make full use of attribute data from different aspects in high dimensional small sample datasets.The hybrid genetic algorithm was composed of main population and sub-population,and it used the result of sub-population to guide the evolution of main population in each iteration,so that it reduced the overfitting problem of small size data.Compared experimental results based on the medical datasets show that our algorithm enhances the stability and robustness of genetic algorithm,and realizes good feature selection results.

关 键 词:高维小样本数据 特征选择 互信息 遗传算法 过拟合问题 微阵列数据 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象