非概率样本与概率样本的融合推断新方法  

A New Method for Fusion Inference of Non-probabilistic and Probabilistic Samples

在线阅读下载全文

作  者:刘展 王典妮 潘莹丽 彭璐 Liu Zhan;Wang Dianni;Pan Yingli;Peng Lu(Faculty of Mathematics and Statistics,Hubei University,Wuhan 430062,China;School of Statistics and Mathematics,Zhongnan University of Economics and Law,Wuhan 430073,China;School of Economics,Jinan University,Guangzhou 510632,China)

机构地区:[1]湖北大学数学与统计学学院,武汉430062 [2]中南财经政法大学统计与数学学院,武汉430073 [3]暨南大学经济学院,广州510632

出  处:《统计与决策》2023年第8期5-11,共7页Statistics & Decision

基  金:国家社会科学基金资助项目(21XTJ006)。

摘  要:随着大数据与网络调查的发展,非概率样本重新引起了人们的关注与重视,然而非概率样本入样概率未知,利用非概率样本推断总体存在一定的困难。概率样本入样概率已知,然而其无回答率不断上升使得数据缺失日益严重,而有缺失的概率样本可能会产生有偏的总体估计。文章考虑非概率样本与概率样本的优缺点,提出结合非概率样本与概率样本对总体进行融合推断的方法。假设非概率样本所有变量数据完整,概率样本协变量数据完整而目标变量缺失,首先,对非概率样本建立超总体局部多项式回归模型,预测概率样本缺失的目标变量,得到完整的概率样本数据;然后,结合非概率样本与概率样本,建立倾向得分模型估计各样本单元的倾向得分,并采用倾向得分逆加权和倾向得分加权组调整两种方法进一步进行调整,得到非概率样本的入样概率估计,从而构造非概率样本的权数;最后,对两类样本的权数进一步进行调整,将两类样本融合为一个样本,实现对总体的估计。模拟与实证研究表明,基于超总体模型与倾向得分模型的非概率样本与概率样本融合得到的总体估计在偏差、方差与均方误差上都小于单个样本的总体估计,估计效果较好。With the development of big data and web surveys,non-probabilistic samples have attracted the attention of people again.However,it is difficult to make inference from non-probabilistic samples due to the unknown selection probabilities of non-probabilistic samples.Probabilistic samples with missing data which results from the rise of non-response rates may produce biased population estimates,although the selection probabilities of probabilistic samples are known.This paper takes into account the advantages and disadvantages of non-probabilistic samples and probabilistic samples to propose the fusion inference approach via combining non-probabilistic samples and probabilistic samples.Suppose that all variables in the non-probabilistic samples are observed and the covariates in the probabilistic samples are observed but the response variable is missing.First,a hyperpopulation local polynomial regression model is established for the non-probabilistic samples,and the target variables missing in the probabilistic samples are predicted to obtain the complete probabilistic sample data.Then,the propensity score model is established to estimate propensity scores of sample units via combining the non-probabilistic sample and the probabilistic sample,and the selection probabilities of the non-probabilistic samples are estimated by adopting the inverse weight adjustment and the weight class adjustment of propensity scores to conduct further adjustment on the estimated propensity scores.The weights of the non-probabilistic samples are further constructed.Finally,two samples are fused to infer the population via further adjusting their weights.Simulation and empirical analysis show that the population estimation based on the fusion of non-probabilistic and probabilistic samples based on the hyperpopulation model and propensity score model is smaller than the population estimation of a single sample in terms of deviation,variance and mean square error.The proposed method has relatively better estimation effect.

关 键 词:超总体模型 倾向得分模型 非概率样本 概率样本 局部多项式回归模型 

分 类 号:O212.1[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象