基于众包学习的交互式特征选择方法  被引量:4

An interactive feature selection method based on learning-from-crowds

在线阅读下载全文

作  者:陈长建 姜流 雷娜 刘世霞[1] Changjian CHEN;Liu JIANG;Na LEI;Shixia LIU(School of Software,Tsinghua University,Beijing 100084,China;International School of Information Science and Engineering,Dalian University of Technology,Dalian 116024,China)

机构地区:[1]清华大学软件学院,北京100084 [2]大连理工大学国际信息与软件学院,大连116024

出  处:《中国科学:信息科学》2020年第6期794-812,共19页Scientia Sinica(Informationis)

基  金:国家重点研发计划(批准号:2018YFB1004300);国家自然科学基金(批准号:61672308,61761136020,61936002)资助项目。

摘  要:集成特征选择算法将多种特征选择方法结果结合在一起,旨在得到更加有效的特征子集.然而这些算法通常假设每种特征选择方法是平等的,没有考虑不同特征选择方法性能的差异性,导致少数方法选择出的有效特征被忽略.为解决这一问题,本文提出一种可以有效地结合不同特征选择方法优势,并利用专家的知识逐步改善所选特征的交互式特征选择方法.该方法包括一个基于众包学习的集成特征选择算法和一个基于该算法开发的可视分析系统.基于众包学习的集成特征选择算法利用众包学习模型对不同特征选择方法的性能进行建模,计算每种方法的可靠性,并在此基础上将这些方法的结果有机融合.可视分析系统提供了丰富的排序方式,帮助专家理解单个特征选择方法的特征选择结果和特征在分类任务中所起的作用,从而让专家交互迭代地改善现有特征子集.在4个真实世界数据集上的数值实验表明,相比于现有的集成特征选择算法,本文提出的算法能够带来0.63%~2.85%分类准确率的提升.此外,在文本和图像数据集上进行的两个案例分析表明,本文提出的可视分析系统能够进一步带来0.28%~5.24%的分类准确率提升.Ensemble feature selection algorithms aggregate the results of multiple feature selection methods in order to select an effective subset of features.However,typically,ensemble algorithms treat each feature selection method equally and do not consider performance differences.Consequently,features selected by a relatively smaller number of methods may not be included.To address this problem,we propose an interactive feature selection method that can more effectively aggregate the results of multiple feature selection methods and iteratively improve the selected features by integrating expert knowledge.The proposed method includes a learning-from-crowds-based ensemble feature selection algorithm and a visual analysis system.The algorithm models the performance of multiple feature selection methods,calculates their reliabilities,and aggregates results.To integrate expert knowledge,the visual analysis system provides a set of ranking schemes to assist experts in understanding the results of an individual feature selection method and the roles played by the features in classification tasks.A numerical experiment conducted on four real-world datasets shows that the proposed algorithm can improve classification accuracy by 0.63%–2.85%compared to state-of-the-art ensemble feature selection algorithms.In addition,we conducted case studies on text and image data to demonstrate that the proposed visual analysis system can further improve classification accuracy by 0.28%–5.24%.

关 键 词:集成特征选择 众包学习 可视分析 交互式可视化 排序可视化 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象