大数据背景下概率-非概率样本的数据整合推断——从误差校正的视角出发  被引量:2

Data Integration by Combining Probability and Non-Probability Samples for Finite Population Inference in the Context of Big Data:An Error Correction Perspective

在线阅读下载全文

作  者:刘晓宇 金勇进[2] 倪成 Liu Xiaoyu;Jin Yongjin;Ni Cheng

机构地区:[1]首都经济贸易大学统计学院 [2]中国人民大学应用统计科学研究中心 [3]中国人民大学统计学院

出  处:《统计研究》2023年第8期149-160,共12页Statistical Research

基  金:首都经济贸易大学新入职青年教师科研启动基金资助(XRZ2023076)。

摘  要:以互联网为媒介的调查数据采集具有成本低、速度快等优势,但这些样本通常属于非概率样本,存在覆盖误差和选择性偏差,不具有总体代表性,无法直接用于有限总体推断。基于概率-非概率样本的数据整合,可以综合两类样本的优势,处理这些非概率样本偏差。本文将非概率样本看作有限总体的不完全覆盖,在假定概率样本和非概率样本有重合的前提下,构造数据整合事后分层与校准估计。该假定是校准的基础,在此框架下考虑测量误差的校正,在概率样本或非概率样本存在测量误差的情况下,提出基于无偏误真值的校准和基于有偏误测量值修正的校准两种思路。此外,本文还提出基于Bagging决策树的半监督分类法,用于识别非概率样本和概率样本的重合部分,这在实际工作中具有较强的指导意义。The internet-based survey data collection has the advantages of low cost and fast speed.However,these samples are usually non-probability samples with coverage errors and selection biases,which cannot represent the population of interest and cannot be directly used for finite population inference.Data integration,which is developed to combine the advantages of probability and non-probability samples,can be used to handle the bias of non-probability samples.Our approach is to treat the non-probability sample as an incomplete sampling frame for the finite population.Assuming that probability samples and non-probability samples overlap,post-stratification and calibration estimators based on data integration are constructed.This assumption is the basis of calibration.Under this framework,the correction of measurement errors is considered.In the case of measurement errors in probability samples or non-probability samples,two ideas are proposed,namely,the calibration estimation based on unbiased true values and the calibration estimation based on correction of biased measurement values.In addition,this paper also proposes a semi-supervised classification method based on Bagging decision tree to identify the overlapping units,which makes a lot of sense in practical work.

关 键 词:数据整合 非概率样本 测量误差 校准法 Bagging决策树 

分 类 号:C811[社会学—统计学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象