检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘君强[1] 周青峰[1] 王文慧[2] 时磊[1]
机构地区:[1]浙江工商大学 [2]浙江水利水电学院
出 处:《电信科学》2015年第4期78-86,共9页Telecommunications Science
基 金:国家自然科学基金资助项目(No.61272306);浙江省自然科学基金资助项目(No.LY12F02024)~~
摘 要:现有数据挖掘算法的缺点是在挖掘大数据时会出现大量候选模式,从而造成可伸缩性瓶颈,个别算法虽然不生成候选模式,但是计算代价高昂,缺乏有效剪裁,运行效率存在瓶颈。为此,提出一个全新的单阶段不生成候选模式的数据挖掘算法,其创新性有3点:一是基于前缀生长的模式枚举和基于效用上限值评估的剪裁策略;二是基于稀疏矩阵和虚拟投影的效用信息表达;三是节省存储空间的深度优先搜索方法。大量实验表明.新算法的时间效率比现有算法高5倍以上,并且内存使用量比现有算法少20%~60%,可伸缩性高。Most of the latest works on utility mining generates a huge number of candidates in dealing with big data, which suffers from the scalability issue. Some work does not generate candidates, but suffers from the efficiency issue due to lack of strong pruning and high computation overhead. A novel algor/thm that finds high utility patterns in a single phase without generating candidates was proposed. The novelties lie in a prefix growth strategy with strong pruning, and a sparse matrix based representation of transactions with pseudo projection. The proposed algorithm works in a depth first manner and does not materialize high utility patterns in memory, which further improves the scalability. Extensive experiments on synthetic and real-world data show that the proposed algorithm outperforms the latest works in terms of running time, memory overhead, and scalability.
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30