大规模数据的分布式稳健特征筛选方法的研究  

Research on Distributed Robust Feature Screening Method for Large Scale Data

在线阅读下载全文

作  者:王康宁 郝孟杰 王洪伟 蔡超[1] WANG Kang-ning;HAO Meng-jie;WANG Hong-wei;CAI Chao(School of Statistics,Shandong University of Business and Technology,Yantai 264005,China)

机构地区:[1]山东工商学院统计学院,山东烟台264005

出  处:《数理统计与管理》2025年第1期144-156,共13页Journal of Applied Statistics and Management

基  金:国家自然科学基金项目(11901356)。

摘  要:为解决筛选方法对异常值敏感导致的结果偏差甚至决策错误等问题,本文提出分布式RACS特征筛选方法:在分布式ACS筛选方法的基础上将相关系数表示为成分参数的函数,对局部机器的U统计量求中位数作为各成分参数的无偏估计量,并将方法应用到多种类型的模型观察在不同界限参数、组数和污染比例的表现,结果表明ACS筛选方法的稳健性较差,而RACS筛选方法始终能完整筛选出真正变量,RACS特征筛选方法比ACS特征筛选方法具有更优良的筛选效果。In order to solve the problems such as result bias and even decision error caused by the sensitivity of the screening method to outliers,this paper proposes a distributed RACS feature screening method:Based on the distributed ACS screening method,the correlation coefficient is expressed as a function of component parameters,the median of local machine statistics is calculated as an unbiased estimator of each component parameter,and the method is applied to various types of models to study the performance of different boundary parameters,group number and pollution proportion.The results show that the robustness of the ACS screening method is poor.RACS screening method can always completely screen out real variables,and RACS feature screening method has better screening effect than ACS feature screening method.

关 键 词:大规模数据 分布式框架 特征筛选 

分 类 号:O212[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象