大规模非概率样本数据的分布式推断方法研究  

Research on Distributed Inference Methods for Large-scale Non-probability Sample Data

在线阅读下载全文

作  者:刘展 潘莹丽 Liu Zhan;Pan Yingli(Faculty of Mathematics and Statistics,Hubei University,Wuhan 430062,China;Hubei Key Laboratory of Applied Mathematics,Hubei University,Wuhan 430062,China)

机构地区:[1]湖北大学数学与统计学学院,武汉430062 [2]湖北大学应用数学湖北省重点实验室,武汉430062

出  处:《统计与决策》2025年第7期53-58,共6页Statistics & Decision

基  金:国家社会科学基金一般项目(18BTJ022);国家社会科学基金西部项目(21XTJ006);中国商业统计学会规划课题(2024STZD22)。

摘  要:随着大数据与网络的发展,非概率样本数据规模不断增大,以往单台机器上的推断方法已不再适用,如何在多台机器上对大规模非概率样本数据进行分布式统计推断成为一个热点问题。文章针对大规模非概率样本数据,提出基于One-shot的分布式倾向得分推断方法。首先,将非概率样本数据与参考样本数据划分到不同的Worker机器上,建立Logistic倾向得分模型,基于每台Worker机器的数据计算得到模型参数估计;其次,将其传到Master机器上,采用加权平均得到最终的倾向得分模型参数估计;最后,基于Worker机器上的非概率样本数据与估计的倾向得分得到总体估计。模拟分析和实证研究结果均表明,所提方法的估计在相对偏差、方差、均方误差方面均比分布式简单估计小,与全局估计接近,估计效果良好。With the development of big data and network,the scale of non-probability sample data continues to increase,and the previous inference methods on a single machine are no longer applicable.How to perform distributed statistical inference from large-scale non-probability sample data on multiple machines has become a hot issue.Aiming at the large-scale non-probability sample data,this paper proposes a One-shot-based distributed propensity score inference method.The non-probability sample data and the reference sample data are firstly divided into different Worker machines;a logistic propensity score model is then established,and the model parameters are estimated based on the data in each Worker machine.Then,the model parameter estimates are transmitted to the Master machine,and the weighted average is used to get the final propensity score model parameter estimator.Finally,the population estimate is obtained based on the non-probabilistic sample data in the Worker machine and the estimated propensity score.Simulation analysis and empirical research results show that the proposed estimator has smaller relative bias,variance and mean square error than the distributed simple estimator,and is close to the global estimator,with good effect of estimation.

关 键 词:大规模 非概率样本 分布式 One-shot 倾向得分 

分 类 号:C811[社会学—统计学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象