机构地区:[1]武汉理工大学理学院,武汉430070 [2]武汉大学数学与统计学院,武汉430072
出 处:《计算机应用》2021年第9期2658-2667,共10页journal of Computer Applications
基 金:国家自然科学基金面上项目(61672391)。
摘 要:针对一般特征选择算法未能揭示数据特征与数据类别之间的可解释性映射关系的问题,在基因表达式编程(GEP)的基础上,通过引入初始化方法、变异策略以及适应度评价方法,提出了一种改进的基于层次距离的GEP特征选择分类算法(FSLDGEP)。首先,利用定义的选择概率有导向地初始化种群个体,从而增加种群中有效个体的数量;其次,定义个体的层次邻域,使种群个体基于其层次邻域进行变异,并解决了变异过程中的盲目无导向性问题;最后,将维度缩减率与分类准确率结合起来作为个体的适应度值,从而改变种群单一优化目标的进化模式,并平衡两者之间的关系。在7个数据集上进行5折交叉和10折交叉验证,所提算法给出了数据特征及其类别之间的函数映射关系,将得到的映射函数用于数据分类。与森林优化特征选择算法(FSFOA)、邻域软边界特征选择算法(NSM)、基于邻域有效信息比的特征选择算法(FS-NEIR)等对比算法相比,所提算法的维度缩减率在Hepatitis、WPBC(Wisconsin Prognostic Breast Cancer)、Sonar、WDBC(Wisconsin Diagnostic Breast Cancer)数据集上得到了最好结果;与对比算法相比,所提算法的平均分类准确率在Hepatitis、Ionosphere、Musk1、WPBC、Heart-Statlog、WDBC数据集上得到了最好结果。实验结果验证了所提算法在特征选择分类问题上的可行性、有效性和优越性。Concerning the problem that the interpretable mapping relationship between data features and data categories do not be revealed by general feature selection algorithms.on the basis of Gene Expression Programming(GEP),by introducing the initialization methods,mutation strategies and fitness evaluation methods,an improved Feature Selection classification algorithm based on Layer Distance for GEP(FSLDGEP) was proposed.Firstly,the selection probability was defined to initialize the individuals in the population directionally,so as to increase the number of effective individuals in the population.Secondly,the layer neighborhood of the individual was proposed,so that each individual in the population would mutate based on its layer neighborhood,and the blind and unguided problem in the process of mutation was solved.Finally,the dimension reduction rate and classification accuracy were combined as the fitness value of the individual,which changed the population evolutionary mode of single optimization goal and balanced the relationship between the above two.The 5-fold and 10-fold verifications were performed on 7 datasets,the functional mapping relationship between data features and their categories was given by the proposed algorithm,and the obtained mapping function was used for data classification.Compared with Feature Selection based on Forest Optimization Algorithm(FSFOA),feature evaluation and selection based on Neighborhood Soft Margin(NSM),Feature Selection based on Neighborhood Effective Information Ratio(FS-NEIR)and other comparison algorithms,the proposed algorithm has obtained the best results of the dimension reduction rate on Hepatitis,Wisconsin Prognostic Breast Cancer(WPBC),Sonar and Wisconsin Diagnostic Breast Cancer(WDBC)datasets,and has the best average classification accuracy on Hepatitis,Ionosphere,Muskl,WPBC,Heart-Statlog and WDBC datasets.Experimental results shows that the feasibility,effectiveness and superiority of the proposed algorithm in feature selection and classification are verified.
关 键 词:特征选择 函数发现 基因表达式编程 种群初始化 层次邻域
分 类 号:TP317.4[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...