Eclat算法下电力大数据并行关联规则增量挖掘方法  

An Incremental Mining Method for Parallel Association Rules in Power Big Data Using Eclat Algorithm

作  者:孙瑜 任高明 SUN Yu;REN Gaoming(School of Computer and Software,Shaanxi National Defense Industrial Vocational and Technical College,Xi’an 710000,Shaanxi Province,China)

机构地区:[1]陕西国防工业职业技术学院计算机与软件学院,陕西省西安市710000

出  处:《电力信息与通信技术》2025年第1期83-88,共6页Electric Power Information and Communication Technology

基  金:陕西省教育厅自然科学研究项目“面向高速网络的流量测量关键技术研究”(19JK0085);陕西国防工业职业技术学院科研计划项目“基于skech的网络流量测量方法研究”(Gfy22-13)。

摘  要:电力大数据具有时变性的特点,如果挖掘方法无法实时处理新增数据,及时发现数据之间更新的关联规则,可能导致挖掘结果的滞后和不准确,降低挖掘的准确度。对此,文章提出Eclat算法下电力大数据并行关联规则增量挖掘方法。采用相似项合并策略消除由数据冗余和噪声引起的误导性信息,提高电力大数据的质量;通过最小哈希原理优化Eclat算法,建立Min Hash矩阵估计原始数据集候选项目集,对其实施剪枝,减少数据比较和存储的复杂性,提高挖掘的效率。利用增量更新原则获取更新后候选项目集,并结合Hash Eclat算法快速更新已有的关联规则,实现大数据并行关联规则的增量挖掘,提升关联规则挖掘的准确度。实验结果表明,利用该方法开展关联规则挖掘时,I/O占用量始终在200 kB以下,CPU占用量低于20%,漏检数量和误报数量最低为0,网络通信量最低可达到268 MB,ROC曲线下方面积较大,与当前挖掘方法相比,具有较高的挖掘准确度和较好的挖掘性能。Power big data has the characteristic of time-varying.If mining methods cannot process new data in real time and update association rules between data in a timely manner,it may lead to delayed and inaccurate mining results,reducing the accuracy of mining.To solve this problem,an incremental mining method is proposed for parallel association rules in power big data using the Eclat algorithm.A similarity merging strategy is adopted to eliminate misleading information caused by data redundancy and noise,and improve the quality of power big data.By optimizing the Eclat algorithm using the minimum hash principle,a MinHash matrix is established to estimate the candidate itemsets in the original dataset.Pruning is performed on the candidate itemsets to reduce the complexity of data comparison and storage,and improve the efficiency of mining.The incremental update principle is used to obtain the updated candidate project set,and combined with the Hash Eclat algorithm to quickly update existing association rules,achieve incremental mining of parallel association rules in big data and improve the accuracy of association rule mining.The experimental results show that when using this method for association rule mining,the I/O usage is always below 200kB,the CPU usage is less than 20%,the number of missed and false positives is the lowest at 0,the network communication volume can reach as low as 268MB,and the area under the ROC curve is relatively large.Compared with current mining methods,it has higher mining accuracy and better mining performance.

关 键 词:Eclat算法 电力大数据 并行规则 增量挖掘 数据项合并 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象