检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙林 郭嘉琪 朱雨晨 陈森 SUN Lin;GUO Jiaqi;ZHU Yuchen;CHEN Sen(College of Artificial Intelligence,Tianjin University of Science and Technology,Tianjin 300457,China;College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China)
机构地区:[1]天津科技大学人工智能学院,天津300457 [2]河南师范大学计算机与信息工程学院,河南新乡453007
出 处:《山西大学学报(自然科学版)》2024年第1期93-102,共10页Journal of Shanxi University(Natural Science Edition)
基 金:国家自然科学基金(61772176);河南省科技攻关项目(212102210136)。
摘 要:针对高维基因数据集的最优特征子集不易确定,以及传统的贝叶斯优化算法容易陷入局部最优,导致无法快速筛选出最优参数等问题,本文提出了一种基于Stacking集成和偏探索贝叶斯优化的基因选择方法。首先,使用卡方过滤法剔除原始特征空间中的冗余基因,获得相关性较高的基因,通过贝叶斯优化算法的采集函数进行改进,引入跳出系数,使得贝叶斯优化算法能够自适应地跳出局部最优,降低开销并加快寻优的效率;然后,使用偏探索贝叶斯优化寻找随机森林的最优参数,使用优化后随机森林模型筛选最优基因子集;最后,设计了一种Stacking集成模型框架来构建分类器,并对最优基因子集进行分类,进而构建了基于Stacking集成和偏探索贝叶斯优化的基因选择算法。在9个公开的基因表达谱数据集上进行仿真实验,结果表明所提算法可以快速筛选出最优的基因子集,且具有较高的分类精度。To address the problems that the optimal feature subset of high-dimensional gene datasets is not easy to be determined and the traditional Bayesian optimization algorithm is prone to falling into local optimum,which cannot quickly select the optimal parameters,in this paper,we propose a gene selection method based on the Stacking integration and partial exploration Bayesian optimization.Firstly,the Chi-square filtering scheme is used to eliminate the redundant genes in the original feature space,so as to obtain the genes with high correlation.The acquisition function of the Bayesian optimization algorithm is improved,and the jump out coefficient is introduced,so that the Bayesian optimization algorithm can adaptively jump out of the local optimum.The cost can be reduced and the efficiency of optimization will be speeded up.Secondly,the partial exploration Bayesian optimization is used to find the optimal parameters of random forest.Then,the optimized random forest model is employed to screen the optimal feature subset.Finally,a framework of the Stacking integration model is designed to construct classifier and classify the optimal feature subset,and then a gene selection algorithm based on the Stacking integration and partial exploration Bayesian optimization is constructed.The experimental results on nine public gene expression profile datasets show that the proposed algorithm can quickly select the optimal gene subset with higher classification accuracy.
关 键 词:基因选择 Stacking算法 贝叶斯优化算法 随机森林模型
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.63