检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张东方 陈海燕[1,2] 袁立罡 ZHANG Dong-fang;CHEN Hai-yan;YUAN Li-gang(College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210093, China;College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)
机构地区:[1]南京航空航天大学计算机科学与技术学院,江苏南京211106 [2]软件新技术与产业化协同创新中心,江苏南京210093 [3]南京航空航天大学民航学院,江苏南京211106
出 处:《计算机与现代化》2021年第9期113-120,126,共9页Computer and Modernization
基 金:国家自然科学基金资助项目(61501229);中央高校基本科研业务费专项资金资助项目(NS2019054,NS2020045)。
摘 要:特征选择是模式识别与数据挖掘的关键问题之一,它可以移除数据集中的冗余和不相关特征以提升学习性能。基于最大相关最小冗余准则,提出一种新的基于相关性与冗余性分析的半监督特征选择方法(S2R2),S2R2方法独立于任何分类学习算法。该方法首先对无监督相关度信息度量进行分析与扩充,然后结合信息增益,设计一种半监督特征相关性与冗余性度量,可以有效识别与移除不相关和冗余特征,最后采用增量搜索技术贪婪地构建特征子集,避免搜索指数级大小的解空间,提高算法的运行效率。本文还提出S2R2方法的快速过滤版本,FS2R2,以更好地应对大规模特征选择问题。多个标准数据集上的实验结果表明了所提方法的有效性和优越性。Feature selection is one of the key problems of pattern recognition and data mining,which can be removed dataset redundant and irrelevant features to improve learning performance.Based on the max-relevance and min-redundancy criteria,a novel semi-supervised feature selection method based on relevance and redundancy analysis is proposed.This new method is independent of any classification learning algorithm.Firstly,unsupervised relevance is analyzed and expanded.Then it is combined with information gain to form a semi-supervised feature relevance and redundancy measures,which can effectively identify and remove irrelevant and redundant features.Finally,an incremental forward search is used to construct feature subset in a greedy manner,which avoiding the search for exponential solution spaces and improving algorithm efficiency.This article also proposes the FS2R2 method as a fast version of the S2R2 method to deal with large-scale problems.The experimental results on standard data sets illustrate the effectiveness and superiority of the proposed approaches.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15