检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]陕西师范大学计算机科学学院,西安710062 [2]深圳大学信息工程学院ATR国家重点实验室,广东深圳518060
出 处:《计算机学报》2014年第8期1704-1718,共15页Chinese Journal of Computers
基 金:国家自然科学基金(31372250);陕西省科技攻关项目(2013K12-03-24);中央高校基本科研业务费专项资金项目(GK201102007)资助~~
摘 要:考虑特征之间的相关性对于其类间区分能力的影响,提出了一种新的特征子集区分度衡量准则——DFS(Discernibility of Feature Subsets)准则.该准则考虑特征之间的相关性,通过计算特征子集中全部特征对于分类的联合贡献来判断特征子集的类间辨别能力大小,不再只考虑单个特征对于分类的贡献.结合顺序前向、顺序后向、顺序前向浮动和顺序后向浮动4种特征搜索策略,以支持向量机(Support Vector Machines,SVM)为分类工具,引导特征选择过程,得到4种基于DFS与SVM的特征选择算法.其中在顺序前/后向浮动搜索策略中,首先根据DFS准则加入/去掉特征到特征子集中,然后在浮动阶段根据所得临时SVM分类器的分类性能决定刚加入/去掉特征的去留.UCI机器学习数据库数据集的对比实验测试表明,提出的DFS准则是一种很好的特征子集类间区分能力度量准则;基于DFS与SVM的特征选择算法实现了有效的特征选择;与其他同类算法相比,基于DFS准则与SVM的特征选择算法具有非常好的泛化性能,但其所选特征子集的规模不一定是最好的.To consider the influence of the correlation between features on their discernibility between classes,a new criterion was proposed in this paper to evaluate the discernibility of a feature subset.We referred to this criterion as DFS for the short of the discernibility of feature subsets.DFS considers the correlation between features by computing the discernibility of the whole feature subset between classes,so that it can measure the contribution of the whole feature subset to the classification not only that of one feature.Four feature selection algorithms were put forward by combining the DFS,respectively,with the sequential forward search,sequential backward search,sequential forward floating search,and the sequential backward floating search strategies where support vector machines (SVM) were used as a classification tool to guide the feature selection procedure,especially in the sequential forward/backward floating search procedures where a feature was first added to/deleted from the feature subset using the DFS criterion,then it was deleted from/called back during the floating procedure depending on the accuracy of the corresponding temporary SVM classifier went down/up on training subset after adding/deleting the feature to/from the feature subset.Our algorithms were tested on 10 datasets from UCI machine learning repository.The experimental results demonstrate that the proposed DFS is a good criterion to evaluate the discernibility of a feature subset.The feature selection algorithms based on DFS and SVM can reduce the dimension of a dataset without compromising its classification capacity,and the generalization of these DFS and SVM based algorithms are much better than that of the available algorithms based on the discernibility of a feature subset.However,the cardinality of the selected feature subsets by them may not be the best ones.
关 键 词:特征选择 支持向量机 相关性 特征子集区分度 特征区分度
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249