大数据情境下基于切片逆回归的抽样方法研究  被引量:3

Sampling Method Based on Slice Inverse Regression in Big Data

在线阅读下载全文

作  者:贺建风 石立 HE Jianfeng;SHI Li(School of Economics and Finance,South China University of Technology,Guangzhou Guangdong 510006,China;School of Economics and Trade,Guangzhou Huashang College,Guangzhou Guangdong 511300,China)

机构地区:[1]华南理工大学经济与金融学院,广东广州510006 [2]广州华商学院经济贸易学院,广东广州511300

出  处:《广西师范大学学报(自然科学版)》2022年第1期91-99,共9页Journal of Guangxi Normal University:Natural Science Edition

基  金:国家社会科学基金(19BTJ022);全国统计科学研究重大项目(2020LD02);广州市哲学社科规划智库课题(2021GZZK03);广东省普通高校创新团队项目(2020WCXTD008);广州华商学院导师制项目(2021HSDS01)。

摘  要:大数据时代,抽样调查依然是一种不可或缺的数据获取和统计推断方法,但抽样调查方法需要适应大数据的新时代情境,才能更好地体现其应有的价值。其中,如何抽取到对研究变量有代表性的样本是最值得关切的问题。本文提出一种基于切片逆回归的综合得分抽样法,利用切片逆回归能将因变量信息融入到自变量的特点,先对大数据进行切片逆回归分析,改进其降维过程,再计算各个体主成分综合得分作为入样概率进行抽样。数据模拟分析结果显示,在大数据情境下,相比于未实施抽样和简单随机抽样估计而言,本文提出的方法均具有更好的抽样估计效果,且当个体差别较大时抽样估计效果会更好。最后,实际数据检验也证实了此方法的可行性和有效性。Sampling survey is still an indispensable data acquisition and statistical inference method in the era of big data,but better value depends on the adaptation of sampling method to the real situation of big data.Among them,how to extract representative samples of research variables is the most concerned problem.A comprehensive score sampling method based on slice inverse regression is proposed to solve this problem.The slice inverse regression can integrate the dependent variable information into the independent variable.Firstly,slice inverse regression analysis is used on big data to improve its dimension reduction process.Then,the comprehensive score of each principal component is taken as the sampling probability.The results of data simulation analysis show that the proposed method has better sampling estimation effect compared with the sampling without implementation and simple random sampling estimation in the big data situation,and the better sampling estimation effect appears when the individual difference is large.Finally,the feasibility and effectiveness of this method are verified by the actual data.

关 键 词:大数据 切片逆回归 主成分分析 综合得分 抽样估计 

分 类 号:O212.2[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象