大数据中效用挖掘的快速单阶段算法  被引量:1

Fast Single Phase Algorithm for Utility Mining in Big Data

在线阅读下载全文

作  者:刘君强[1] 周青峰[1] 王文慧[2] 时磊[1] 

机构地区:[1]浙江工商大学 [2]浙江水利水电学院

出  处:《电信科学》2015年第4期78-86,共9页Telecommunications Science

基  金:国家自然科学基金资助项目(No.61272306);浙江省自然科学基金资助项目(No.LY12F02024)~~

摘  要:现有数据挖掘算法的缺点是在挖掘大数据时会出现大量候选模式,从而造成可伸缩性瓶颈,个别算法虽然不生成候选模式,但是计算代价高昂,缺乏有效剪裁,运行效率存在瓶颈。为此,提出一个全新的单阶段不生成候选模式的数据挖掘算法,其创新性有3点:一是基于前缀生长的模式枚举和基于效用上限值评估的剪裁策略;二是基于稀疏矩阵和虚拟投影的效用信息表达;三是节省存储空间的深度优先搜索方法。大量实验表明.新算法的时间效率比现有算法高5倍以上,并且内存使用量比现有算法少20%~60%,可伸缩性高。Most of the latest works on utility mining generates a huge number of candidates in dealing with big data, which suffers from the scalability issue. Some work does not generate candidates, but suffers from the efficiency issue due to lack of strong pruning and high computation overhead. A novel algor/thm that finds high utility patterns in a single phase without generating candidates was proposed. The novelties lie in a prefix growth strategy with strong pruning, and a sparse matrix based representation of transactions with pseudo projection. The proposed algorithm works in a depth first manner and does not materialize high utility patterns in memory, which further improves the scalability. Extensive experiments on synthetic and real-world data show that the proposed algorithm outperforms the latest works in terms of running time, memory overhead, and scalability.

关 键 词:大数据 效用挖掘 高效用模式 频繁模式 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象