检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘凯[1] 郑山红[1] 蒋权[1] 赵天傲 LIU Kai;ZHENG Shan-hong;JIANG Quan;ZHAO Tian-ao(School of Computer Science and Engineering,Changchun University of Technology,Changchun 130012,China)
机构地区:[1]长春工业大学计算机科学与工程学院,吉林长春130012
出 处:《计算机技术与发展》2018年第9期101-104,111,共5页Computer Technology and Development
基 金:吉林省自然科学基金资助项目(20130101060JC);吉林省教育计划"十二五"科学技术研究基金资助项目(2014131;2014125)
摘 要:为了解决传统的随机森林算法在随机特征选择时,导致少数比较重要的特征变量被过滤掉的问题,以及没有考虑特征变量相关性对预测应变量准确性带来的影响,提出了一种基于随机森林的自适应特征选择算法SARFFS。该算法首先利用卡方检验样本间关联程度后自助采样,并设计出一种特征对类代表强弱程度的计算方法;然后引入自适应稀疏约束机制Group LASSO优化特征的选择;最后在Spark分布式计算平台利用UCI数据集进行实验,结果表明,相比传统的RF算法,SARFFS算法在特征子集选择上具有更好的性能,在F1上提升将近9%。从最终排名靠前的重要特征分析,该算法能够考虑特征间相关性,对预测结果确实有影响,并有效地提高了随机属性权值的可靠性和稳定性。In order to solve the problem that a small number of important variables are filtered out in the selection process of random fea-ture adopted by the method of traditional random forest algorithm,and without considering the influence of characteristic variable correla-tion on the accuracy of prediction variables,we propose an self-adaptive feature selection algorithm SARFFS based on random forests. Itfirst uses the Chi square to test the degree of association between samples and then bootstrap sampling,and we design a method for calcu-lating the intensity and degree of the class represented by the feature. Then,an adaptive sparse constraint mechanism Group LASSO is in-troduced to optimize the selection of the features. Finally, the experiments are carried out on the Spark distributed computing platform u-sing UCI data sets,which shows that compared with the traditional RF algorithm,the SARFFS has better performance in feature subset se-lection,and the efficiency has been increased nearly 9% in the F1 . From the analysis of important characteristics of the final ranking,theproposed algorithm can consider the correlation having an effect on the prediction results definitely,and improves the reliability and stabil-ity of random attribute weights effectively.
关 键 词:随机森林 自适应 特征选择 GroupLASSO方法
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28