面向数据分发系统的改进型并行I/O研究  

Research on Improved Parallel I/O for Data Distribution System

在线阅读下载全文

作  者:肖招娣 皇甫汉聪 余永忠 吕顺锋 XIAO Zhao-di;HUANGFU Han-cong;YU Yong-zhong;LV Shun-feng(Foshan Power Supply Bureau,Guangdong Power Grid Co.,Ltd.,Foshan 528000 China;Guangdong Zhuo Wei Network Co.,Ltd.,Foshan 528000 China)

机构地区:[1]广东电网有限责任公司佛山供电局,广东佛山528000 [2]广东卓维网络有限公司,广东佛山528000

出  处:《自动化技术与应用》2018年第10期38-42,共5页Techniques of Automation and Applications

摘  要:随着用户和业务复杂度的增加,数据仓库的数据对外服务能力急需提升,数据分发系统作为统一接口分发管理,不可避免地面对多用户数据访问的并发性通信阻塞问题。本文利用开源的Kettle工具构建数据分发功能应用,运用并行计算思想提升串行算法效率。在并行化过程中,详述了传统的数据分发收集并行I/O方案,并构建了时间估计方程。在分析总结其瓶颈问题的基础上,借鉴GoogleFileSystem的思想,提出了基于元数据的并行I/O改进型新方案。实验证明,不论并行计算进程数(计算单元数)多少,基于元数据的并行I/O方案比数据分发收集方案都具有更好的性能,数据导入、导出耗时更短。The external service capability of data warehouse urgently needs to be improved with the increase of users and business complexity.As a unified interface,data distribution system is distributed and managed,and it is inevitable to deal with the congested communication congestion with multi-user data access.In this paper,open-source kettle tools are used to build data distribution applications,parallel computing ideas are used to improve the efficiency of serial algorithms.In the parallelization process,the traditional data distribution and collection parallel I/O scheme is described in detail,and the time estimation equation is constructed.On the basis of analyzing and summarizing its bottleneck problem,this paper proposes a new scheme of parallel I/O improvement based on metadata,referring to the idea of Google File System.Experiments show that,regardless of the number of parallel computing processes(the number of computational units),the metadata-based parallel I/O scheme has better performance than the data distribution and collection scheme,and the data import and derivation takes less time.

关 键 词:数据分发 并行计算 并行I/O GoogleFileSystem 元数据 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象