检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王生武 陈红梅[1,2] WANG Sheng-wu;CHEN Hong-me(School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China;Key Laboratory of Cloud Computing and Intelligent Technology,Southwest Jiaotong University,Chengdu 611756,China)
机构地区:[1]西南交通大学信息科学与技术学院,成都611756 [2]西南交通大学云计算与智能技术高校重点实验室,成都611756
出 处:《计算机科学》2020年第2期44-50,共7页Computer Science
基 金:国家自然科学基金(61572406)~~
摘 要:随着互联网和物联网技术的发展,数据的收集变得越发容易。但是,高维数据中包含了很多冗余和不相关的特征,直接使用会徒增模型的计算量,甚至会降低模型的表现性能,故很有必要对高维数据进行降维处理。特征选择可以通过减少特征维度来降低计算开销和去除冗余特征,以提高机器学习模型的性能,并保留了数据的原始特征,具有良好的可解释性。特征选择已经成为机器学习领域中重要的数据预处理步骤之一。粗糙集理论是一种可用于特征选择的有效方法,它可以通过去除冗余信息来保留原始特征的特性。然而,由于计算所有的特征子集组合的开销较大,传统的基于粗糙集的特征选择方法很难找到全局最优的特征子集。针对上述问题,文中提出了一种基于粗糙集和改进鲸鱼优化算法的特征选择方法。为避免鲸鱼算法陷入局部优化,文中提出了种群优化和扰动策略的改进鲸鱼算法。该算法首先随机初始化一系列特征子集,然后用基于粗糙集属性依赖度的目标函数来评价各子集的优劣,最后使用改进鲸鱼优化算法,通过不断迭代找到可接受的近似最优特征子集。在UCI数据集上的实验结果表明,当以支持向量机为评价所用的分类器时,文中提出的算法能找到具有较少信息损失的特征子集,且具有较高的分类精度。因此,所提算法在特征选择方面具有一定的优势。With the development of the Internet and Internet of Things technologies,data collection has become easier.However,it is necessary to reduce the dimensionality of high-dimensional data.High-dimensional data contain many redundant and unrelated features,which will increase the computational complexity of the model and even reduce the performance of the model.Feature selection can reduce the computational cost and remove redundant features by reducing feature dimensions to improve the performance of a machine learning model,and retain the original features of the data,with good interpretability.It has become one of important data preprocessing steps in machine learning.Rough set theory is an effective method which can be used to feature selection.It preserves the characteristics of the original features by removing redundant information.However,it is difficult to find the global optimal feature subset by using the traditional rough sets-based feature selection method because the cost of computing all feature subset combinations is very high.In order to overcome above problems,a feature selection method based on rough sets and improved whale optimization algorithm was proposed.An improved whale optimization algorithm was proposed by employing poli-tics of population optimization and disturbance so as to avoid local optimization.The algorithm first randomly initializes a series of feature subsets,and then uses the objective function based on the rough sets attribute dependency to evaluate the goodness of each subset.Finally,the improved whale optimization algorithm is used to find an acceptable approximate optimal feature subset by iterations.The experimental results on the UCI dataset show that the proposed algorithm can find a subset of features with less information loss and has higher classification accuracy when the support vector machine is used as the classifier for evaluation.Therefore,the proposed algorithm has a certain advantage in feature selection.
关 键 词:特征选择 粗糙集理论 改进鲸鱼优化算法 属性依赖度 最优特征子集
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147