一种面向非公经济的数据查询优化方法研究  

Research on Data Query Optimization Method for Non-public Economy

在线阅读下载全文

作  者:王威 杨靖琦 田承东 WANG Wei;YANG Jingqi;TIAN Chengdong(Information Science Academy of China Electronics Technology Group Corporation,Beijing 100041)

机构地区:[1]中国电子科技集团公司信息科学研究院,北京100041

出  处:《软件》2023年第6期9-14,共6页Software

基  金:科技部科技创新2030“新一代人工智能”重大项目(2020AAA0105100)资助。

摘  要:全国工商联在进行相关业务数据处理时,需要处理来自不同地域、不同领域的非公企业金融、信用等经济数据,目的是横向统筹单一或多个组织在一个或多个领域相关的指标数据,为后续非公经济发展提供决策依据;此外,需判断全国工商联接入各省市工商联数据资源的情况,以对全国非公经济数据进行分析研判。本文提出一种数据虚拟化应用方法,并基于此方法连接使用跨域数据资源。将工商联接入的多源异构非公经济数据进行逻辑虚拟化,构造数据的逻辑空间,通过分簇算法优化元数据查询,并提出了基于Spark SQL分布式查询的优化方法,实施Catalyst自动缓存策略、使用中间数据结构等方式提高数据查询效率与读取性能,在保证数据准确性的同时提高工商联数据资源的利用率,促进工商联数据资产有效利用。When processing relevant business data,the All China Federation of Industry and Commerce needs to process economic data such as finance and credit of non-public enterprises from different regions and fields,with the aim of horizontally coordinating indicator data related to one or more organizations in one or more fields,providing decision-making basis for subsequent non-public economic development;In addition,it is necessary to assess the access of the All China Federation of Industry and Commerce to data resources of various provincial and municipal federations of industry and commerce,in order to analyze and judge the national nonpublic economic data.This article proposes a data virtualization application method and uses it to connect and utilize cross domain data resources.By logically virtualizing the multi-source heterogeneous non-public economic data accessed by the Federation of Industry and Commerce,constructing a logical space for the data,optimizing metadata queries through clustering algorithms,and proposing an optimization method based on Spark SQL distributed queries,implementing the Catalyst automatic caching strategy,using intermediate data structures,and other methods to improve data query efficiency and read performance,and making sure the accuracy of data at the same time to improve the utilization of data resources,promoting the effective utilization of the Federation of Industry and Commerce's data assets.

关 键 词:数据虚拟化 查询引擎 查询优化 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象