基于差异点集的频繁项集挖掘算法  被引量:3

Frequent pattern mining algorithm based on DiffNodeset

在线阅读下载全文

作  者:尹远 朱璐伟[1,2] 文凯 YIN Yuan;ZHU Lu-wei;WEN Kai(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Institute of Applied Communication Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Chongqing Information Technology Designing Limited Company,Chongqing 401121,China;Chongqing Branch,China Telecom Limited Company,Chongqing 401121,China)

机构地区:[1]重庆邮电大学通信与信息工程学院,重庆400065 [2]重庆邮电大学通信新技术应用研究中心,重庆400065 [3]重庆信科设计有限公司,重庆401121 [4]中国电信股份有限公司重庆分公司,重庆401121

出  处:《计算机工程与设计》2020年第3期716-720,共5页Computer Engineering and Design

摘  要:针对目前频繁模式挖掘算法存在的建树复杂、挖掘效率低下等问题,提出一种基于差异点集(DiffNodeset)的Top-rank-k频繁模式挖掘DNTK算法。利用差集运算直接获取k(>2)项集的差异点集,避免项集多次复杂连接过程;结合一种线性时间复杂度连接方法和早期修剪策略,提出一种更为高效的1-项集连接方法,及时判定项集连接可行性;采用包含索引策略减少项集连接次数。实验结果表明,DNTK算法在时间和空间效率方面性能优于FAE和NTK算法,在不同类型数据集中进行频繁项集挖掘时有良好的效果。To solve the problems of complex tree construction and low mining efficiency in current frequent pattern mining algorithms,a Top-rank-k frequent pattern mining DNTK algorithm based on DiffNodeset was proposed.The DiffNodeset of k(>2)item set was obtained directly using the difference set operation,which avoided the multiple complex connection processes of the item set.Combining a linear time complexity connection method with early pruning strategy,a more efficient 1-item set connection method was proposed,which determined the feasibility of connection in time.The subsume index strategy was used to reduce the number of item set connection.The DNTK algorithm is better than the NTK algorithm and FAE algorithm in terms of time and space efficiency.It has good effects in different types of data sets.

关 键 词:频繁项集挖掘 差异点集 优先k项集 早期修剪 包含索引 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象