关联规则挖掘的取样误差量化模型和快速估计算法  被引量:7

Quantitative Model and Fast Estimation Algorithm of Sampling Error for Association Rule Mining

在线阅读下载全文

作  者:贾彩燕[1] 陆汝钤[1] 

机构地区:[1]复旦大学计算机科学与工程系

出  处:《计算机学报》2006年第4期625-634,共10页Chinese Journal of Computers

基  金:国家"九七三"重点基础研究发展规划项目基金(2001CCA03000);国家"八六三"高技术研究发展计划项目基金(2001AA113130);国家自然科学基金重点项目(60496325);中国科学院智能科学项目基金等资助

摘  要:在关联规则挖掘过程中,现有的取样误差量化方法和快速估计算法存在着不足,对此提出了一种新的取样误差量化三元组模型,并在实验观察和理论分析的基础上给出了一种取样误差的快速估计算法———主误差区间估计法.理论分析和实验结果均表明,此方法不但可以精确、有效地度量出样本集与原始数据集包含的频繁模式信息间的差异,而且,主误差区间估计法还可以精确、快速地估计出取样误差,并能灵活地嵌入到关联规则挖掘的各种取样方法之中;其核心思想还可以用于改进分布、并行关联规则挖掘方法的效率.Sampling is a simple and effective technique to improve the efficiency and the scalability of algorithms for association rule mining. However, there is lack of necessary research to define the degree of error with respect to the outcome of the algorithm, i. e. , a quantitative model to measure the sampling error, and to estimate the error efficiently. In this paper, based on the systematic analysis, the authors point out the deficiency of the current results in this field and give a novel, flexible quantitative model to measure the sampling error, and propose a high efficient computational method, interval estimation algorithm of cardinal error, for estimating sampling error based on the real observation and the theoretical analysis. Both of theoretical analysis and realistic experiments show the error between sample set and original dataset can be obtained effectively and accurately by the model, the representative capability of sample set to original dataset also can be estimated efficiently and exactly by the interval estimation algorithm of cardinal error. What's more, the interval estimation algorithm of cardinal error can be conveniently nested into sampling algorithms to speed up them and is useful for distributed, parallel association rule mining algorithms.

关 键 词:关联规则 频繁项集 取样误差 主误差 PAC学习 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象