低码率生成式无人机视频编码算法  

Low Bit Rate Generative Drone Video Compression

在线阅读下载全文

作  者:刘美琴[1,2] 陈虹宇 周一鸣 倪文昊 LIU Meiqin;CHEN Hongyu;ZHOU Yiming;NI Wenhao(Institute of Information Science,Beijing Jiaotong University,Beijing 100044,China;Visual Intelligence+X International Cooperation Joint Laboratory,Beijing Jiaotong University,Beijing 100044,China)

机构地区:[1]北京交通大学信息科学研究所,北京100044 [2]北京交通大学视觉智能交叉创新教育部国际合作联合实验室,北京100044

出  处:《数据采集与处理》2025年第2期320-333,共14页Journal of Data Acquisition and Processing

基  金:国家自然科学基金(62372036)。

摘  要:空天地海复杂环境下海量的视频数据给有限的传输带宽和存储设备带来了巨大的压力,因此如何提高视频编码技术在低码率条件下的编码效率显得尤为关键。近年来,基于深度学习的视频编码算法取得了良好的进展,却因优化目标与感知质量失配、训练数据分布偏差等问题,降低了极低码率下的视觉感知质量。生成式编码通过学习数据分布有效提升了低码率下的纹理与结构复原能力,缓解了深度视频压缩的模糊伪影问题。然而,现有研究仍存在两大瓶颈:一是时域相关性建模不足,帧间关联缺失;二是动态比特分配机制欠缺,难以实现关键信息的自适应提取。为此,提出一种基于条件引导扩散模型的视频编码算法(Conditional guided diffusion modelvideo compression,CGDMVC),旨在改善低码率条件下视频感知质量的同时,加强帧间特征建模能力和保留关键信息。具体地,该算法设计了隐式帧间对齐策略,利用扩散模型捕获帧间潜在特征,降低估计显式运动信息的计算复杂度。同时,设计的自适应时空重要性编码器可动态分配码率优化关键区域的生成质量。此外,引入感知损失函数,结合感知图像块相似度(Learned perceptual image patch similarity,LPIPS)约束,以提高重建帧的视觉保真度。实验结果表明,与DCVC(Deep contextual video compression)等算法相比,该算法在低码率(<0.1 BPP)情况下,LPIPS值平均降低了36.49%,展现出更丰富的纹理细节和更自然的视觉效果。In complex environments across air,space,land,and sea,the massive volume of video data exerts tremendous pressure on limited transmission bandwidth and storage devices.Therefore,improving the coding efficiency of video compression technologies under low bit rate conditions becomes crucial.In recent years,deep learning-based video compression algorithms have made significant progress,yet due to issues such as model design flaws,mismatches between optimization objectives and perceptual quality,and biases in training data distributions,the visual perception quality at extremely low bit rates has been compromised.Generative encoding effectively improves the texture and structure restoration ability at low bit rates through data distribution learning,alleviating the problem of blur artifacts in deep video compression.However,there are still two major bottlenecks in existing research:Firstly,time domain correlation modeling is insufficient and interframe feature correlation is missing;secondly,the lack of dynamic bit allocation mechanism makes it difficult to achieve adaptive extraction of key information.Therefore,this article proposes a video encoding algorithm based on conditional guided diffusion model video compression(CGDMVC),aiming to improve the perceptual quality of videos under low bitrate conditions while enhancing interframe feature modeling capabilities and preserving key information.Specifically,the algorithm designs an implicit interframe alignment strategy,utilizing a diffusion model to capture potential interframe features and reduce the computational complexity of estimating explicit motion information.Meanwhile,the designed adaptive spatiotemporal importanceaware coder can dynamically allocate code rates to optimize the generation quality of key regions.Furthermore,a perceptual loss function is introduced,combined with the learned perceptual image patch similarity(LPIPS)constraint,to improve the visual fidelity of the reconstructed frames.Experimental results demonstrate that,compared to algorithms s

关 键 词:视频编码 扩散模型 感知质量 帧间对齐 低码率 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象