考虑时序特征的污染物数据异常检测及恢复  被引量:3

Abnormal detection and recovery of pollutant data considering time series characteristics

在线阅读下载全文

作  者:陆秋琴[1] 王璐[1] 黄光球[1] LU Qiuqin;WANG Lu;HUANG Guangqiu(School of Management,Xi'an University of Architecture and Technology,Xi'an 710055,China)

机构地区:[1]西安建筑科技大学管理学院,西安710055

出  处:《安全与环境学报》2023年第12期4590-4599,共10页Journal of Safety and Environment

基  金:国家自然科学基金项目(71874134)。

摘  要:针对气体传感器数据采集过程中可能出现数据失真、数据重复的现象,提出一种基于时间序列滑动窗口的异常检测方法。基于滑动窗口将原始时间序列分割成多个子序列,利用斜率的置信区间距离半径提取子序列时序特征并识别疑似异常序列,再通过时间序列分解与基于密度的噪声应用空间聚类方法(Density-based Spatial Clustering of Applications with Noise,DBSCAN)进一步判定异常值。以某区域挥发性有机物(Volatile Organic Compounds,VOCs)数据作为验证数据集,检测结果表明该算法能够准确识别异常子序列和异常值,精确率、查全率以及平衡F分数(F_(1))分别为93.7%、90.7%和92.18%,验证了提出方法的可用性。同时,针对异常为缺失值的情况,提出了一种基于支持向量机回归(Support Vector Regression,SVR)的恢复模型,经验证决定系数R^(2)为96.53%,优于对比模型。Environmental monitoring systems may not collect accurate pollutant concentration data from sensor networks due to system failures and other reasons.This study proposes corresponding detection and processing methods for three common outliers in the acquisition process.In the case of distorted data and duplicate data,an anomaly detection method based on the characteristics of time series is proposed.The method was divided into two stages.The first stage divides the original time series into multiple sub-series using a Sliding Window model.The sub-series features based on the radius of the confidence interval distance of the window slope are extracted to identify suspected anomalous sequences.In the second stage,the time series of the current window is decomposed based on the Seasonal and Trend decomposition using Loess(STL) method,and the serial residuals are obtained after removing the periodic term and trend term from the original series.Then based on the cluster analysis(DBSCAN),the points that can not be classified as a certain cluster are identified as outliers,and finally,the outlier information is output.We take the Volatile Organic Compounds(VOCs) data of a region as the validation dataset.Testing results show that the algorithm can accurately identify abnormal subsequences and outliers.Precision,Recall,and F1-score of 93.7%,90.7%,and 92.18% verify the usability of the proposed method.For the missing data,there is a recovery model based on Support Vector Regression(SVR) proposed.At first,the input eigenvalues are dimensionalized using Principal Component Analysis(PCA).It uses Particle Swarm Optimization(PSO) algorithm to find the optimal parameters,which overcomes the problem that the detection results are not accurate enough due to artificially set parameters.It tests the validation set based on the recovery model and compares it with ARIMA and PSO-SVR algorithms.The results show that the Mean Square Error(MSE),Mean Absolute Error(MAE),and Coefficient of Determination(R^(2)) of the proposed model are bett

关 键 词:环境工程学 挥发性有机物(VOCs) 滑动窗口算法(Sliding_Window) 时间序列分解 基于密度的噪声应用空间聚类方法(DBSCAN) 支持向量机回归(SVR) 

分 类 号:X511[环境科学与工程—环境工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象