检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]福建工程学院计算机与信息科学系,福建福州350108 [2]东南大学计算机科学与工程学院,江苏南京210096
出 处:《数学的实践与认识》2013年第8期160-169,共10页Mathematics in Practice and Theory
基 金:国家自然科学基金(61073059);福建省自然科学基金(2012J01245);福建工程学院科研启动基金(GY-Z12003)
摘 要:取样是一种通用有效的近似技术,利用取样技术进行近似聚集查询处理是决策支持系统和数据挖掘工具中的常用方法,如何正确有效地给出近似查询结果并最小化近似查询误差是查询处理的关键和目标.在对应用于近似聚集查询的代表性取样方法Congressional Samples(国会取样)深入研究的基础上,指出其存在的不足和应用的局限,提出了一个优化的Congressional Samples取样方法:OptCongress算法,算法在组数据内部存在高方差分布时能克服原算法简单均匀取样的不足,提高了近似聚集查询的质量,同时改进了原算法的各组取样数分配算法,克服了原分配算法缺乏严格的公式描述,难以进行理论评估的不足.最后,通过实验比较验证了该优化算法的有效性和正确性.Sampling is an efficient and most widely-used approximation technique. Its abil- ity to approximately answer aggregation queries accurately and efficiently is of great benefit to decision support and data mining tools. Congressional Samples is a representative and influential sampling algorithm used in approximate aggregation queries, but it is sub-optimal in some scenario. OptCongress presented by us is an optimization of Congressional Samples. OptCongress proposes a new samples allocation algorithm which tries to minimize the MSE of the expected query distribution. The lack of a rigorous problem formulation leads to solutions that are difficult to evaluate theoretically, existing in original Congressional Samples, were overcome with OptCongress. Meanwhile, the problem of ignoring the variance in the data distribution of the aggregated column(s) is treated that approximation errors were signifi- cantly reduced compared to original Congressional Samples. Finally, a set of experiments on the modified TPC-H database demonstrate the correctness and effectiveness of the technique proposed.
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.24.23