GPU Usage Time-Based Ordering Management Technique for Tasks Execution to Prevent Running Failures of GPU Tasks in Container Environments  

作  者:Joon-Min Gil Hyunsu Jeong Jihun Kang 

机构地区:[1]Department of Computer Engineering,Jeju National University,Jeju-do,63243,Republic of Korea [2]Department of Computer Science,Korea National Open University,Seoul,03087,Republic of Korea

出  处:《Computers, Materials & Continua》2025年第2期2199-2213,共15页计算机、材料和连续体(英文)

基  金:supported by“Regional Innovation Strategy(RIS)”through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(MOE)(2023RIS-009).

摘  要:In a cloud environment,graphics processing units(GPUs)are the primary devices used for high-performance computation.They exploit flexible resource utilization,a key advantage of cloud environments.Multiple users share GPUs,which serve as coprocessors of central processing units(CPUs)and are activated only if tasks demand GPU computation.In a container environment,where resources can be shared among multiple users,GPU utilization can be increased by minimizing idle time because the tasks of many users run on a single GPU.However,unlike CPUs and memory,GPUs cannot logically multiplex their resources.Additionally,GPU memory does not support over-utilization:when it runs out,tasks will fail.Therefore,it is necessary to regulate the order of execution of concurrently running GPU tasks to avoid such task failures and to ensure equitable GPU sharing among users.In this paper,we propose a GPU task execution order management technique that controls GPU usage via time-based containers.The technique seeks to ensure equal GPU time among users in a container environment to prevent task failures.In the meantime,we use a deferred processing method to prevent GPU memory shortages when GPU tasks are executed simultaneously and to determine the execution order based on the GPU usage time.As the order of GPU tasks cannot be externally adjusted arbitrarily once the task commences,the GPU task is indirectly paused by pausing the container.In addition,as container pause/unpause status is based on the information about the available GPU memory capacity,overuse of GPU memory can be prevented at the source.As a result,the strategy can prevent task failure and the GPU tasks can be experimentally processed in appropriate order.

关 键 词:Cloud computing CONTAINER GPGPU resource management 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象