基于SCAD惩罚回归的异常值检测方法  被引量:9

Outlier Detection Method Based on SCAD Penalty Regression

在线阅读下载全文

作  者:潘莹丽 刘展 宋广雨 Pan Yingli;Liu Zhana;Song Guangyu(School of Mathematics and Statistics,Hubei University,Wuhan 430062,China;Hubei Key Laboratory of Applied Mathematics,Hubei University,Wuhan 430062,China)

机构地区:[1]湖北大学数学与统计学学院,武汉430062 [2]湖北大学应用数学湖北省重点实验室,武汉430062

出  处:《统计与决策》2022年第4期38-42,共5页Statistics & Decision

基  金:国家自然科学基金资助项目(11901175)。

摘  要:异常值检测方法研究是当今数据分析领域的一个热门问题。传统的基于模型的异常值检测方法,往往是先对模型中的参数进行估计,再检测异常值,但是异常值的存在会影响参数估计值,从而使得异常值检测结果不可靠。文章基于线性回归模型,引入异常值识别变量,提出线性均值漂移模型。在进行低维数据异常值检测时,对漂移项施加SCAD惩罚,利用坐标下降算法同时进行参数估计和异常值检测;在进行高维数据异常值检测时,对模型参数和异常值识别变量分别施加SCAD惩罚,利用坐标下降算法同时进行参数估计、变量选择和异常值检测。基于线性均值漂移模型,采用SCAD惩罚回归的思想设计坐标下降算法,消除了低维和高维数据中异常值的存在对参数估计带来的不利影响。The research of outlier detection is a hot issue in the field of data analysis. The traditional model-based outlier detection methods often estimate the parameters in the model, and then detect the outliers. However, the outliers will affect the results of parameter estimation, which makes the effect of outlier detection unreliable. Based on linear regression model, this paper introduces outliers to identify variables and proposes a linear mean shift model. When outliers of low-dimensional data are being detected, SCAD penalty is applied to the drift items, and the parameter estimation and outliers detection are carried out simultaneously by using the coordinate descent algorithm. When outliers of high-dimensional data are being detected, SCAD penalty is applied to parameters of the model and outliers identification variables respectively, and parameter estimation, variable selection and outliers detection are carried out simultaneously by using coordinate descent algorithm. Based on the linear mean shift model and the idea of SCAD penalty regression, a coordinate descent algorithm is designed to eliminate the adverse effects of outliers in low and high dimensional data on parameter estimation.

关 键 词:异常值检测 线性均值漂移模型 SCAD惩罚 坐标下降算法 

分 类 号:O212.7[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象