检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李保珍 张诗莹 郭红建 Li Baozhen;Zhang Shiying;Guo Hongjian(School of Information Engineering,Nanjing Audit University,Nanjing 211815,China)
出 处:《统计与决策》2021年第19期5-10,共6页Statistics & Decision
基 金:国家自然科学基金资助项目(71673122,72074117);江苏省社会科学基金资助项目(20WTB007)。
摘 要:异常点探测的阈值确定多基于专家经验,但在大数据环境下,人工确定阈值的方法既不能满足海量数据的需求,又存在主观片面的弊端。文章基于贝叶斯理论,提出了一种t模型和t混合模型的异常点阈值自动识别方法,并应用HMC算法对模型中的超参数进行了后验推断。根据世界银行和Kaggle提供的真实数据及K-L散度评价指标,新构建的异常点阈值自动识别模型具有如下优势:(1)t分布比正态分布更能体现实际数据的分布特征;(2)基于超参数揭示数据分布参数的特征;(3)基于贝叶斯理论揭示超参数、参数、数据三者的条件依赖关系。The threshold determination of outlier detection is mostly based on experience of experts. However, in the context of big data, the manual threshold determination method can not meet the demand of mass data, but also has the disadvantages of subjectivity and one-sidedness. Based on Bayesian theory, this paper proposes an automatic threshold identification method for outliers of t model and t mixed model, and also uses the HMC algorithm to make posterior inferences for hyperparameters in the model. According to the real data provided by the World Bank and Kaggle and the K-L divergence evaluation index, the newly constructed automatic threshold identification model for outliers has the following advantages:(1) t distribution can better reflect the distribution characteristics of actual data than normal distribution.(2) The characteristics of data distribution parameters are revealed based on hyperparameters.(3) The conditional dependencies of hyperparameters, parameters and data are revealed based on Bayesian theory.
分 类 号:O212[理学—概率论与数理统计]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.185