基于随机森林的自适应特征选择算法  被引量:9

A Self-adaptive Feature Selection Algorithm Based on Random Forest

在线阅读下载全文

作  者:刘凯[1] 郑山红[1] 蒋权[1] 赵天傲 LIU Kai;ZHENG Shan-hong;JIANG Quan;ZHAO Tian-ao(School of Computer Science and Engineering,Changchun University of Technology,Changchun 130012,China)

机构地区:[1]长春工业大学计算机科学与工程学院,吉林长春130012

出  处:《计算机技术与发展》2018年第9期101-104,111,共5页Computer Technology and Development

基  金:吉林省自然科学基金资助项目(20130101060JC);吉林省教育计划"十二五"科学技术研究基金资助项目(2014131;2014125)

摘  要:为了解决传统的随机森林算法在随机特征选择时,导致少数比较重要的特征变量被过滤掉的问题,以及没有考虑特征变量相关性对预测应变量准确性带来的影响,提出了一种基于随机森林的自适应特征选择算法SARFFS。该算法首先利用卡方检验样本间关联程度后自助采样,并设计出一种特征对类代表强弱程度的计算方法;然后引入自适应稀疏约束机制Group LASSO优化特征的选择;最后在Spark分布式计算平台利用UCI数据集进行实验,结果表明,相比传统的RF算法,SARFFS算法在特征子集选择上具有更好的性能,在F1上提升将近9%。从最终排名靠前的重要特征分析,该算法能够考虑特征间相关性,对预测结果确实有影响,并有效地提高了随机属性权值的可靠性和稳定性。In order to solve the problem that a small number of important variables are filtered out in the selection process of random fea-ture adopted by the method of traditional random forest algorithm,and without considering the influence of characteristic variable correla-tion on the accuracy of prediction variables,we propose an self-adaptive feature selection algorithm SARFFS based on random forests. Itfirst uses the Chi square to test the degree of association between samples and then bootstrap sampling,and we design a method for calcu-lating the intensity and degree of the class represented by the feature. Then,an adaptive sparse constraint mechanism Group LASSO is in-troduced to optimize the selection of the features. Finally, the experiments are carried out on the Spark distributed computing platform u-sing UCI data sets,which shows that compared with the traditional RF algorithm,the SARFFS has better performance in feature subset se-lection,and the efficiency has been increased nearly 9% in the F1 . From the analysis of important characteristics of the final ranking,theproposed algorithm can consider the correlation having an effect on the prediction results definitely,and improves the reliability and stabil-ity of random attribute weights effectively.

关 键 词:随机森林 自适应 特征选择 GroupLASSO方法 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象