基于自适应领域粗糙集的多标签在线流特征选择  被引量:2

Multi-label online stream feature selection based on Adaptive Neighborhood Rough Set

在线阅读下载全文

作  者:张海翔 李培培[3] 胡学钢[3] ZHANG Haixiang;LI Peipei;HU Xuegang(Hefei Hospital Affiliated to Anhui,Hefei 230012,China;The Second People′s Hospital of Hefei,Hefei 230012,China;Key Laboratory of Big Data Knowledge Engineering Ministry of Education,Hefei University of Technology,Hefei 230601,China)

机构地区:[1]安徽医科大学附属合肥医院,安徽合肥230012 [2]合肥市第二人民医院,安徽合肥230012 [3]合肥工业大学大数据知识工程教育部重点实验室,安徽合肥230601

出  处:《微电子学与计算机》2022年第7期44-53,共10页Microelectronics & Computer

基  金:重点研发计划项目课题一(2016YFB1000901);国家自然科学基金项目(61976077,62076085,91746209)。

摘  要:多标签特征选择指在多标签场景下选出代表性属性.已有的多标签特征选择方法大多集中在事先获得全部特征空间,而没有考虑流式特征情况.随着时间的推移,这些特征不断地流入模型中.此外,一些流方法需要在学习之前指定参数.因此,在训练不同类型数据集之前,如何选取统一和最优参数成为一种难题.基于此,本文定义自适应邻域粗糙集关系-Gap,并提出自适应领域粗糙集多标签在线流特征选择方法(Multi-Label Online stream Feature Selection based on Adaptive Neighborhood Rough Set,ML-OFS-ANRS).其中邻域粗糙集的数据挖掘不需要任何特征空间结构的先验知识,在处理混合数据时也不会破坏数据的邻域和顺序结构.在第一阶段,根据动态最大依赖将相关和重要的特征选择到已选子集中.为过滤冗余特征,计算每个特征的重要性,并在已选子集中执行并行归约作为第二阶段.因而,采用"动态最大依赖、在线冗余减少"评价标准,ML-OFS-ANRS可以选择高相关性、低冗余的特征.实验表明,在10种不同类型的数据集上,ML-OFS-ANRS在特征数量相同的情况下优于传统特征选择方法和先进的在线流特征选择算法.Multi-label feature selection aims to select representative attributes in multi-label scenarios.Most of the existing multi-label feature selection methods focus on obtaining all the feature spaces in advance without considering the streaming feature situation.These features constantly flow into the model one by one over time.In addition,other streaming feature methods need to specify parameters before learning.Therefore,before training different types of data sets,how to select uniform and optimal parameters becomes a difficult problem..Motivated by this,this paper defines the adaptive neighborhood rough set relationship-Gap,and proposes the Multi-Label Online stream Feature Selection based on Adaptive Neighborhood Rough Set(ML-OFS-ANRS).The data mining of neighborhood rough sets does not require any prior knowledge of the feature space structure.It also does not breakingthe neighborhood and order structure of the data when dealing with mixed data.In the first stage,relevant and importantfeatures are selected into the selected subset based on dynamic maximal-dependency.To filter redundant features,the importance of each feature is calculated and parallel reduction is performed in the selected subsetas the second stage.Thus,with the"dynamic maximal-dependency,online irrelevancy discarding"evaluation criteria,ML-OFS-ANRS can select features with high correlation and low redundancy.Experimental results show that ML-OFS-ANRS is superior to traditional feature selection methods and advanced online stream feature selection algorithms when the number of features is the same on 10 different types of data sets.

关 键 词:多标签分类 特征流 邻域粗糙集 在线流特征选择 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象