基于孤立森林的取水数据异常值检测  

Detecting Abnormal Water Extraction Data Based on Isolation Forest

在线阅读下载全文

作  者:徐浩[1] 刘怀利[1] 瞿暄 XU Hao;LIU Huai-li;QU Xuan(Anhui&Huaihe River Institute of Hydraulic Research,Hefei 230088,China)

机构地区:[1]安徽省(水利部淮河水利委员会)水利科学研究院,安徽合肥230088

出  处:《水电能源科学》2024年第9期29-32,59,共5页Water Resources and Power

基  金:安徽省自然科学联合基金项目(2208085US05)。

摘  要:为快速准确地检测出供水企业取水量数据的异常值,提出了基于孤立森林的无监督学习算法,以安徽省水资源取水监测平台提供的A~D四个供水企业取水量数据为例,并通过试验将其与传统箱线图法和有监督学习的k近邻算法进行比较。结果表明,基于孤立森林的无监督学习算法因其独特的树状结构,使其在进行点异常值检测时平均F1、AAUC值分别达到0.9630、0.9980,较k近邻算法分别高约56.40%、22.47%,较箱线图法分别高约18.92%、9.70%。虽然模拟区间异常取水行为时,基于孤立森林的无监督学习算法性能有所下降,但稳定性仍优于k近邻算法和箱线图法,这表明在异常数据类型检测方面基于孤立森林的无监督学习算法具有一定优越性。In order to quickly and accurately detect the outliers of water withdrawal data of water supply enterprises,an unsupervised learning algorithm based on isolation forest was proposed.The water withdrawal data of four water supply enterprises(A-D)provided by Anhui water resource intake monitoring platform was taken as an example.The data were compared with the traditional boxplot method and supervised learning k-nearest neighbor algorithm through experiments.The results show that the average F1 and AAUC values obtained by the unsupervised learning algorithm based isolation forest reach 0.9630 and 0.9980 respectively due to its unique tree structure,which are about 56.40%and 22.47%higher than the k-nearest neighbor algorithm,18.92%and 9.70%higher than the boxplot method,respectively.Although the performance of the unsupervised learning algorithm based on isolation forest was degraded when simulating the abnormal water intake behavior in the interval,its stability was still better than that of k-nearest neighbor algorithm and boxplot method,which indicates that the unsupervised learning algorithm based on isolation forest has certain advantages in the detection of abnormal data types.

关 键 词:异常值检测 取水量 孤立森林 K近邻 箱线图 

分 类 号:TV214[水利工程—水文学及水资源] TU991.31[建筑科学—市政工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象