检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:韩潇 王明秋[1] 赵胜利[1] Han Xiao;Wang Mingqiu;Zhao Shengli(School of Statistics and Data Science,Qufu Normal University,Qufu Shandong 273165,China)
机构地区:[1]曲阜师范大学统计与数据科学学院,山东曲阜273165
出 处:《统计与决策》2024年第15期59-64,共6页Statistics & Decision
基 金:国家自然科学基金面上项目(12271294,12171277)。
摘 要:大数据统计分析在有限的计算资源下面临一些挑战性问题,用子数据代替全数据进行统计分析成为一种选择。文章基于最小协方差行列式的稳健距离,为大数据Logistic回归模型提出了一种更高效的子数据选择算法。通过大量的数值模拟,在不同的标准下比较了所提算法与其他已有算法的性能。结果表明,所提算法具有较高的估计效率和计算效率,与全数据相比,计算时间显著减少。与其他算法相比,所提算法得到的子数据信息矩阵行列式的值更大。同时,当协变量之间存在高度相关性时,所提算法具有稳健性。最后,通过对实际数据集的分析,说明了所提算法的预测误差更小。The statistical analysis of big data is faced with some challenging problems under the limited computing resources,so it is a choice to use sub-data instead of full data for statistical analysis.Based on the robust distance of the minimum covariance determinant,this paper proposes a more efficient sub-data selection algorithm for logistic regression models with big data,then conducts a large number of numerical simulations,and compares the performance of the proposed algorithm with that of other existing algorithms under different criteria.The results are shown as below:The proposed algorithm has higher estimation efficiency and computational efficiency,and has a significant reduction in computational time compared with the full data.The value of the determinant of the sub-data information matrix obtained by the proposed algorithm is larger than those obtained by other algorithms.Meanwhile,the proposed method is robust when there is a high correlation between covariates.Finally,the analysis is made onthe actual data set,which shows that the proposed algorithm has smaller prediction error.
分 类 号:O212.2[理学—概率论与数理统计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.23.59.191