Short video preloading via domain knowledge assisted deep reinforcement learning  

在线阅读下载全文

作  者:Yuhong Xie Yuan Zhang Tao Lin Zipeng Pan Si-Ze Qian Bo Jiang Jinyao Yan 

机构地区:[1]School of Information and Communication Engineering,Communication University of China,Beijing 100024,China [2]State Key Laboratory of Media Convergence and Communication,Communication University of China,Beijing 100024,China [3]School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China

出  处:《Digital Communications and Networks》2024年第6期1826-1836,共11页数字通信与网络(英文版)

基  金:supported by the National Key Research and Development Program of China(No.2021YFF0900503);partly by the National Natural Science Foundation of China(No.62262018,61971382)。

摘  要:Short video applications like Tik Tok have seen significant growth in recent years.One common behavior of users on these platforms is watching and swiping through videos,which can lead to a significant waste of bandwidth.As such,an important challenge in short video streaming is to design a preloading algorithm that can effectively decide which videos to download,at what bitrate,and when to pause the download in order to reduce bandwidth waste while improving the Quality of Experience(QoE).However,designing such an algorithm is non-trivial,especially when considering the conflicting objectives of minimizing bandwidth waste and maximizing QoE.In this paper,we propose an end-to-end Deep reinforcement learning framework with Action Masking called DAM that leverages domain knowledge to learn an optimal policy for short video preloading.To achieve this,we introduce a reward shaping technique to minimize bandwidth waste and use action masking to make actions more reasonable,reduce playback rebuffering,and accelerate the training process.We have conducted extensive experiments using real-world video datasets and network traces including 4G/Wi Fi/5G.Our results show that DAM improves the Qo E score by 3.73%-11.28%compared to state-of-the-art algorithms,and achieves an average bandwidth waste of only 10.27%-12.07%,outperforming all baseline methods.

关 键 词:Short video preloading Deep reinforcement learning Reward shaping Action masking Domain knowledge 

分 类 号:TN919.8[电子电信—通信与信息系统] TP18[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象