检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马红明 马浩 杨迪 吴宏波 刘家丞 李骥 MA Hongming;MA Hao;YANG Di;WU Hongbo;LIU Jiacheng;LI Ji(State Grid Hebei Marketing Service Center,Shijiazhuang 050021,China;State Grid Hebei Electric Power Co.,Ltd.,Shijiazhuang 050021,China;Ministry of Education Key Lab for Intelligent Networks and Network Security,Xi'an Jiaotong University,Xi'an 710049,China)
机构地区:[1]国网河北省电力有限公司营销服务中心,石家庄050021 [2]国网河北省电力有限公司,石家庄050021 [3]西安交通大学智能网络与网络安全教育部重点实验室,西安710049
出 处:《电测与仪表》2024年第9期120-126,共7页Electrical Measurement & Instrumentation
基 金:国家自然科学基金资助项目(61773308)。
摘 要:能源互联网架构下,电力营销大数据是支撑智能电网众多高级应用的关键基础,数据清洗对于电力营销大数据更是极为重要。然而,数据缺失问题会不可避免地出现在实际电网运行环节中,严重影响数据的分析和使用。针对上述问题,文章以Spark大数据在线处理平台为基础,提出了融合相似用户聚类和奇异值阈值理论的在线数据清洗框架和方法。借助奇异值分解,证明了电力营销数据具有近似低秩特性。以此为基础,考虑电力用户的用电差异,提出了一种融合改进K最近邻算法和奇异值阈值理论的在线数据清洗框架和方法。同时,针对奇异值阈值模型计算缓慢问题,提出采用滑动时间窗在线修复策略,加快修复速度,提升修复精度。最后,通过河北省某电力营销数据验证了所提算法的有效性,实验结果显示该在线修复算法能够更快速、高效地修复大规模电力营销缺省数据。Under the framework of energy Internet,power marketing big data is the foundation to support many advanced applications of smart grid,and data cleaning is extremely important for power marketing big data.However,the data missing problem will inevitably appear in the actual operation of power grid,which greatly affects the analysis and use of data.Aiming at the above problem,this paper proposes an online data cleaning framework and method based on spark platform,which combines similar user clustering and singular value thresholding theory.Firstly,with the help of singular value decomposition,it is proved that the power marketing data has the characteristics of approximate low rank.On this basis,considering the power consumption difference of power users,this paper proposes an online data cleaning framework and method which integrates the improved K-nearest neighbor clustering and the theory of singular value thresholding.Meanwhile,in order to solve the problem of slow calculation of singular value thresholding model,a sliding time window online recovery strategy is proposed to accelerate the repair speed and improve the recovery accuracy.Finally,the effectiveness of the proposed algorithm is verified by power marketing data of Hebei Province.The experimental results show that the online recovery algorithm can repair the large-scale missing data of power marketing more quickly and effectively.
关 键 词:数据清洗 电力营销数据 缺省数据恢复 奇异值阈值算法
分 类 号:TM743[电气工程—电力系统及自动化]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.21.93.159