GPU Usage Time-Based Ordering Management Technique for Tasks Execution to Prevent Running Failures of GPU Tasks in Container Environments

作　　者：Joon-Min Gil Hyunsu Jeong Jihun Kang

机构地区：[1]Department of Computer Engineering,Jeju National University,Jeju-do,63243,Republic of Korea [2]Department of Computer Science,Korea National Open University,Seoul,03087,Republic of Korea

出　　处：《Computers, Materials & Continua》2025年第2期2199-2213,共15页计算机、材料和连续体(英文)

基　　金：supported by“Regional Innovation Strategy(RIS)”through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(MOE)(2023RIS-009).

摘　　要：In a cloud environment,graphics processing units(GPUs)are the primary devices used for high-performance computation.They exploit flexible resource utilization,a key advantage of cloud environments.Multiple users share GPUs,which serve as coprocessors of central processing units(CPUs)and are activated only if tasks demand GPU computation.In a container environment,where resources can be shared among multiple users,GPU utilization can be increased by minimizing idle time because the tasks of many users run on a single GPU.However,unlike CPUs and memory,GPUs cannot logically multiplex their resources.Additionally,GPU memory does not support over-utilization:when it runs out,tasks will fail.Therefore,it is necessary to regulate the order of execution of concurrently running GPU tasks to avoid such task failures and to ensure equitable GPU sharing among users.In this paper,we propose a GPU task execution order management technique that controls GPU usage via time-based containers.The technique seeks to ensure equal GPU time among users in a container environment to prevent task failures.In the meantime,we use a deferred processing method to prevent GPU memory shortages when GPU tasks are executed simultaneously and to determine the execution order based on the GPU usage time.As the order of GPU tasks cannot be externally adjusted arbitrarily once the task commences,the GPU task is indirectly paused by pausing the container.In addition,as container pause/unpause status is based on the information about the available GPU memory capacity,overuse of GPU memory can be prevented at the source.As a result,the strategy can prevent task failure and the GPU tasks can be experimentally processed in appropriate order.

关键词：Cloud computing CONTAINER GPGPU resource management

分类号：TP302.7[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

GPU Usage Time-Based Ordering Management Technique for Tasks Execution to Prevent Running Failures of GPU Tasks in Container Environments

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

GPU Usage Time-Based Ordering Management Technique for Tasks Execution to Prevent Running Failures of GPU Tasks in Container Environments

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索