检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Jie Liu Yijia Cao Yong Li Yixiu Guo Wei Deng
机构地区:[1]College of Electrical and Information Engineering,Hunan University,Changsha 410082,China [2]State Grid Hunan Electric Power Company Limited Research Institute,Changsha 410007,China
出 处:《CSEE Journal of Power and Energy Systems》2024年第6期2528-2538,共11页中国电机工程学会电力与能源系统学报(英文)
基 金:supported in part by the National Natural Science Foundation of China(NSFC)under Grant U1966207 and 51822702;in part by the Key Research and Development Program of Hunan Province of China under Grant 2018GK2031;in part by the 111 Project of China under Grant B17016,in part by the Innovative Construction Program of Hunan Province of China under Grant 2019RS1016;in part by the Excellent Innovation Youth Program of Changsha of China under Grant KQ1802029.
摘 要:In order to improve the data quality,the big data cleaning method for distribution networks is studied in this paper.First,the Local Outlier Factor(LOF)algorithm based on DBSCAN clustering is used to detect outliers.However,due to the difficulty in determining the LOF threshold,a method of dynamically calculating the threshold based on the transformer districts and time is proposed.In addition,the LOF algorithm combines the statistical distribution method to reduce the misjudgment rate.Aiming at the diversity and complexity of data missing forms in power big data,this paper has improved the Random Forest imputation algorithm,which can be applied to various forms of missing data,especially the blocked missing data and even some completely missing horizontal or vertical data.The data in this paper are from real data of 44 transformer districts of a certain 10 kV line in a distribution network.Experimental results show that outlier detection is accurate and suitable for any shape and multidimensional power big data.The improved Random Forest imputation algorithm is suitable for all missing forms,with higher imputation accuracy and better model stability.By comparing the network loss prediction between the data using this data cleaning method and the data removing outliers and missing values,it can be found that the accuracy of network loss prediction has improved by nearly 4%using the data cleaning method identified in this paper.Additionally,as the proportion of bad data increased,the difference between the prediction accuracy of cleaned data and that of uncleaned data is more significant.
关 键 词:Data cleaning DBSCAN LOF missing data imputation outliers detection Random Forest
分 类 号:TM76[电气工程—电力系统及自动化] TP39[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222