检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:房俊[1,2] 薛晓东 周云亮 FANG Jun;XUE Xiaodong;ZHOU Yunliang(School of Information,North China University of Technology,Beijing 100144,China;Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data,Beijing 100144,China)
机构地区:[1]北方工业大学信息学院,北京100144 [2]大规模流数据集成与分析技术北京市重点实验室,北京100144
出 处:《计算机工程》2023年第11期284-292,301,共10页Computer Engineering
基 金:国家自然科学基金国际(地区)合作与交流项目(62061136006)。
摘 要:目前大多数近似查询方法都是用一个估计值来回答查询,这种点估计的方法虽然简单但是会存在误差。区间估计方法需要在大量样本上完成计算,会造成较高的查询时延,导致在实际中难以广泛应用。以模型驱动的近似查询技术虽在效率上有一定优势,但其查询结果缺乏可靠性保障。为此,提出一种融合数据抽样和机器学习算法的近似查询方法,通过深度生成模型提高查询效率,用区间估计代替点估计来回答查询,即通过多个样本的查询结果来生成一个相对可靠的区间结果。首先利用改进的生成对抗网络模型学习数据分布,在不访问数据集的情况下快速生成多个样本,然后利用大规模并行处理架构来分配计算任务,完成样本生成和查询执行的过程,最后将查询结果返回给用户。实验结果表明,该方法得出的聚合查询区间估计结果的归一化置信区间覆盖率(NCIC)达到85%以上,在聚合函数为COUNT且选择性低于0.03的查询实验中,针对ROAD、PM2.5这2个数据集,该方法的NCIC较随机抽样方法分别提高了13.9%和14.8%,虽然其查询时延相较基准方法有所增加,但是也可满足常规应用要求。Currently,most approximate query methods use estimation to answer a query.Although this type of point estimation is simple,it consistently produces errors.Because it must complete calculation on the basis of a large number of samples,the interval estimation method causes high query delay and is difficult to apply in practice.Although the model-driven approximate query technique has advantages in terms of efficiency,its query results lack reliability.To address this challenge,an approximate query method combining data sampling and machine learning algorithms is proposed herein.The depth generation model is used to improve query efficiency,and instead of point estimation,interval estimation is used to answer the query.Thus,a relatively reliable interval result is generated through multiple sample query results.First,the improved Generative Adversarial Network(GAN)model is used to learn the data distribution,and subsequently,multiple samples are rapidly generated without accessing the dataset.The massive parallel processing architecture is used to assign computing tasks,complete the sample generation and query execution processes,and finally the query results are obtained.Experimental results demonstrate that the Normalized Confidence Interval Coverage(NCIC)of the aggregate query interval estimated results obtained by the proposed method is over 85%.In a query experiment with the aggregate function COUNT and selectivity lower than 0.03,for ROAD and PM2.5 datasets,the NCIC for this method is 13.9%and 14.8%higher,respectively,than the random sampling method.Although the query delay increases compared with the benchmark method,it was confirmed that the proposed solution meets common application requirements.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.224.202.121