检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]郑州大学信息工程学院,郑州450001 [2]北京交通大学计算机与信息技术学院,北京100044
出 处:《计算机研究与发展》2010年第3期463-473,共11页Journal of Computer Research and Development
基 金:国家"九七三"重点基础研究发展计划基金项目(2006CB705500);国家"八六三"高技术研究发展计划基金项目(2007AA010408)
摘 要:数据流频繁项集挖掘是目前数据挖掘与知识发现领域的热点研究课题,在许多领域有重要应用.然而支持度阈值的设定需要一定的领域知识,设置不当会给后续的分析处理带来很多困难和不必要的负担,因此挖掘数据流top-K频繁项集有重要意义.提出一个挖掘数据流界标窗口top-K频繁项集的动态增量近似算法TOPSIL-Miner,为此设计了存储流数据摘要信息的概要结构TOPSIL-Tree以及动态记录挖掘相关信息的树层最大支持度表MaxSL、项目序表OIL,TOPSET和最小支持度表MinSL等,并分析了与这些概要结构相关的挖掘特性.在此基础上研究算法的3种优化措施:1)剪枝当前数据流的平凡项集;2)挖掘过程中启发式自适应提升挖掘阈值;3)动态提升剪枝阈值.对算法的误差上界进行了分析研究.最后通过实验验证了算法的可行性、精确性和时空高效性.Frequent knowledge discovery itemset mining over data streams becomes a hot topic in data mining and recently, which has been applied to different areas, However, the setting of a minimum support threshold needs some domain knowledge. It will bring many difficulties or much burden to users if the support threshold is not set reasonably. It is interesting for users to find top-K significant itemsets over data streams. A dynamic incremental approximate algorithm, TOPSIL- Miner, is presented to mine top-K significant itemsets in landmark windows. A new data structure, TOPSIL-Tree, is designed to store the potential significant itemsets, and other data structures of maximum support list, ordered item list, TOPSET and minimum support list are devised to maintain the information about mining results. Moreover, three optimization strategies are exploited to reduce the time and space cost of the algorithm. 1) pruning trivial nodes in the current data stream; 2) promoting mining support threshold during mining process heuristically and adaptively; and 3) promoting pruning threshold dynamically. The accuracy of the algorithm is also analyzed. Extensive experiments are performed to evaluate the good effectiveness, the high efficiency and precision of the algorithm.
关 键 词:数据挖掘 数据流 界标窗口 频繁项集 概要数据结构
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15