基于簇和阈值区间的高效关联规则隐藏算法被引量：9

An Efficient Association Rule Hiding Algorithm Based on Cluster and Threshold Interval

作　　者：牛新征[1] 王崇屹叶志佳[1] 佘堃[2] Niu Xinzheng;Wang Chongyi;Ye Zhijia;She Kun(School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731;School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054)

机构地区：[1]电子科技大学计算机科学与工程学院,成都611731 [2]电子科技大学信息与软件工程学院,成都610054

出　　处：《计算机研究与发展》2017年第12期2785-2796,共12页Journal of Computer Research and Development

基　　金：国家自然科学基金项目(61300192);国家科技支撑计划基金项目(2013BAH33F02);中央高校基本科研业务费专项资金项目(ZYGX2014J052);四川省科技支撑计划基金项目(2015GZ0096);成都市科学技术局软科学研究项目(2015-RK00-00046-ZF);四川省公安厅科研项目(2015SCYYCX06);四川省自贡市公安局项目~~

摘　　要：关联规则隐藏是隐私保护数据挖掘(privacy-preserving data mining,PPDM)的一种重要方法.针对当前的关联规则隐藏算法直接操作事务数据、I/O开销较大的缺陷,提出一种基于FP-tree快速关联规则隐藏的算法FP-DSRRC.算法首先对FP-tree的结构进行改进,增设事务编号索引并建立双向遍历结构,进而利用改进的FP-tree对事务信息进行快速处理,避免了遍历原始数据集产生的大量I/O时间;然后通过建立和维护事务索引表实现对敏感项的快速查找,并基于分簇策略对关联规则处理,以簇为单位进行敏感规则消除,同时采用规则支持度和置信度阈值区间的思想,减少了关联规则隐藏处理对原始数据集的影响;最后通过实验测试证明:相较于传统关联规则隐藏算法,FP-DSRRC算法在保证生成的数据集质量的同时,减少了50%~70%的算法执行时间,并在大规模真实数据集上有较好的可用性.Association rules hiding is a very important method of privacy preserving data mining(PPDM).Because the current association rules hiding algorithm operates the transaction database directly,it leads to a lot of I/O overhead.To solve this problem,we put forward a quick association rules hiding algorithm based on FT tree,called FP DSRRC.Firstly,the algorithm improves the structure of FP tree by adding an index to the transaction number and establishing the bidirectional traverse structure.Then FP DSRRC uses the improved FP tree to quickly handle transaction data set,avoiding a large number of I/O overhead caused by traversal the raw transaction data set.Furthermore,FP DSRRC finds the sensitive items quickly by building and maintaining a transaction index table,and then handles the association rules based on the clustering strategy.We eliminate the sensitive rules by clusters,and reduce the negative influence caused by association rules hiding progress to the original data set by adopting the idea of rule support and confidence degree interval at the same time.Finally,the experiment shows that compared with traditional association rules hiding algorithm,the executive time of FP-DSRRC has been decreased by50%~70%while guaranteeing the quality of general data,moreover,FP-DSRRC has better availability on a large scale real data set.Key words

关键词：隐私保护关联规则隐藏频繁模式树敏感规则数据清洗

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于簇和阈值区间的高效关联规则隐藏算法被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于簇和阈值区间的高效关联规则隐藏算法 被引量：9

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于簇和阈值区间的高效关联规则隐藏算法被引量：9