检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Yuhong Xie Yuan Zhang Tao Lin Zipeng Pan Si-Ze Qian Bo Jiang Jinyao Yan
机构地区:[1]School of Information and Communication Engineering,Communication University of China,Beijing 100024,China [2]State Key Laboratory of Media Convergence and Communication,Communication University of China,Beijing 100024,China [3]School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China
出 处:《Digital Communications and Networks》2024年第6期1826-1836,共11页数字通信与网络(英文版)
基 金:supported by the National Key Research and Development Program of China(No.2021YFF0900503);partly by the National Natural Science Foundation of China(No.62262018,61971382)。
摘 要:Short video applications like Tik Tok have seen significant growth in recent years.One common behavior of users on these platforms is watching and swiping through videos,which can lead to a significant waste of bandwidth.As such,an important challenge in short video streaming is to design a preloading algorithm that can effectively decide which videos to download,at what bitrate,and when to pause the download in order to reduce bandwidth waste while improving the Quality of Experience(QoE).However,designing such an algorithm is non-trivial,especially when considering the conflicting objectives of minimizing bandwidth waste and maximizing QoE.In this paper,we propose an end-to-end Deep reinforcement learning framework with Action Masking called DAM that leverages domain knowledge to learn an optimal policy for short video preloading.To achieve this,we introduce a reward shaping technique to minimize bandwidth waste and use action masking to make actions more reasonable,reduce playback rebuffering,and accelerate the training process.We have conducted extensive experiments using real-world video datasets and network traces including 4G/Wi Fi/5G.Our results show that DAM improves the Qo E score by 3.73%-11.28%compared to state-of-the-art algorithms,and achieves an average bandwidth waste of only 10.27%-12.07%,outperforming all baseline methods.
关 键 词:Short video preloading Deep reinforcement learning Reward shaping Action masking Domain knowledge
分 类 号:TN919.8[电子电信—通信与信息系统] TP18[电子电信—信息与通信工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28