检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]复旦大学计算机科学与工程系
出 处:《计算机学报》2006年第4期625-634,共10页Chinese Journal of Computers
基 金:国家"九七三"重点基础研究发展规划项目基金(2001CCA03000);国家"八六三"高技术研究发展计划项目基金(2001AA113130);国家自然科学基金重点项目(60496325);中国科学院智能科学项目基金等资助
摘 要:在关联规则挖掘过程中,现有的取样误差量化方法和快速估计算法存在着不足,对此提出了一种新的取样误差量化三元组模型,并在实验观察和理论分析的基础上给出了一种取样误差的快速估计算法———主误差区间估计法.理论分析和实验结果均表明,此方法不但可以精确、有效地度量出样本集与原始数据集包含的频繁模式信息间的差异,而且,主误差区间估计法还可以精确、快速地估计出取样误差,并能灵活地嵌入到关联规则挖掘的各种取样方法之中;其核心思想还可以用于改进分布、并行关联规则挖掘方法的效率.Sampling is a simple and effective technique to improve the efficiency and the scalability of algorithms for association rule mining. However, there is lack of necessary research to define the degree of error with respect to the outcome of the algorithm, i. e. , a quantitative model to measure the sampling error, and to estimate the error efficiently. In this paper, based on the systematic analysis, the authors point out the deficiency of the current results in this field and give a novel, flexible quantitative model to measure the sampling error, and propose a high efficient computational method, interval estimation algorithm of cardinal error, for estimating sampling error based on the real observation and the theoretical analysis. Both of theoretical analysis and realistic experiments show the error between sample set and original dataset can be obtained effectively and accurately by the model, the representative capability of sample set to original dataset also can be estimated efficiently and exactly by the interval estimation algorithm of cardinal error. What's more, the interval estimation algorithm of cardinal error can be conveniently nested into sampling algorithms to speed up them and is useful for distributed, parallel association rule mining algorithms.
关 键 词:关联规则 频繁项集 取样误差 主误差 PAC学习
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33