一种优化的Hadoop数据放置策略被引量：1

An Optimized Hadoop Data Placement Strategy

作　　者：吴岳 WU Yue(State Forestry and Grassland Administration Industrial Development Planning Institute,Beijing 100010,China)

机构地区：[1]国家林业和草原局产业发展规划院,北京100010

出　　处：《软件工程》2023年第7期44-47,共4页Software Engineering

摘　　要：Hadoop分布式文件系统(HDFS)的默认数据块放置策略均衡了数据存储的可靠性和读写速度,却没有考虑发挥集群的最佳性能。针对该问题提出了一种优化后的数据块放置算法。该算法为数据块设计2个指标,即被查询率与平均读取时间,用于评估集群执行任务对数据块的需求量。在符合HDFS默认数据放置算法基本规则的前提下,通过对数据块的需求量进行分析,然后重新计算数据块的放置位置,将需求量最多的数据转移到能够最快处理它们的节点上。通过实验数据证明:该算法可以使集群整体性能提高20%以上。优化后的数据块放置算法是有效的,并且不会增加对集群带宽的占用。The default data chunk placement strategy of Hadoop Distributed File System(HDFS)balances the reliability of data storage and read/write speed,but does not consider the optimal performance of the cluster.The paper proposes an optimized data placement algorithm to address this issue.Two indicators for data chunks,namely query rate and average read time are designed in this algorithm,to evaluate the demand of data chunks for cluster execution tasks.On the premise of meeting the basic rules of HDFS default placement algorithm,the data with the highest demand are transferred to the node that can process them the fastest,by analyzing the demand of chunks and recalculating their placement.Experimental data show that the algorithm can improve the overall performance of the cluster by more than 20%.The optimized data chunk placement algorithm is effective and will not increase the utilization of cluster bandwidth.

关键词：HDFS 数据块放置策略性能优化

分类号：TP311.1[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种优化的Hadoop数据放置策略被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种优化的Hadoop数据放置策略 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种优化的Hadoop数据放置策略被引量：1