基于贝叶斯理论的异常点阈值自动识别  被引量:5

Automatic Threshold Identification of Outliers Based on Bayesian Theory

在线阅读下载全文

作  者:李保珍 张诗莹 郭红建 Li Baozhen;Zhang Shiying;Guo Hongjian(School of Information Engineering,Nanjing Audit University,Nanjing 211815,China)

机构地区:[1]南京审计大学信息工程学院,南京211815

出  处:《统计与决策》2021年第19期5-10,共6页Statistics & Decision

基  金:国家自然科学基金资助项目(71673122,72074117);江苏省社会科学基金资助项目(20WTB007)。

摘  要:异常点探测的阈值确定多基于专家经验,但在大数据环境下,人工确定阈值的方法既不能满足海量数据的需求,又存在主观片面的弊端。文章基于贝叶斯理论,提出了一种t模型和t混合模型的异常点阈值自动识别方法,并应用HMC算法对模型中的超参数进行了后验推断。根据世界银行和Kaggle提供的真实数据及K-L散度评价指标,新构建的异常点阈值自动识别模型具有如下优势:(1)t分布比正态分布更能体现实际数据的分布特征;(2)基于超参数揭示数据分布参数的特征;(3)基于贝叶斯理论揭示超参数、参数、数据三者的条件依赖关系。The threshold determination of outlier detection is mostly based on experience of experts. However, in the context of big data, the manual threshold determination method can not meet the demand of mass data, but also has the disadvantages of subjectivity and one-sidedness. Based on Bayesian theory, this paper proposes an automatic threshold identification method for outliers of t model and t mixed model, and also uses the HMC algorithm to make posterior inferences for hyperparameters in the model. According to the real data provided by the World Bank and Kaggle and the K-L divergence evaluation index, the newly constructed automatic threshold identification model for outliers has the following advantages:(1) t distribution can better reflect the distribution characteristics of actual data than normal distribution.(2) The characteristics of data distribution parameters are revealed based on hyperparameters.(3) The conditional dependencies of hyperparameters, parameters and data are revealed based on Bayesian theory.

关 键 词:异常点探测 阈值识别 贝叶斯理论 T混合模型 

分 类 号:O212[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象