检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:熊皓萱 徐媛媛[1] 朱琨[2] XIONG Haoxuan;XU Yuanyuan;ZHU Kun(College of Computer Science and Software Engineering,Hohai University,Nanjing,Jiangsu 211100,China;College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing,Jiangsu 211106,China)
机构地区:[1]河海大学计算机与软件学院,江苏南京211100 [2]南京航空航天大学计算机科学与技术学院,江苏南京211106
出 处:《信号处理》2025年第2期350-358,共9页Journal of Signal Processing
基 金:国家自然科学基金(62061146002)。
摘 要:近年来,随着计算机视觉在智能监控、自动驾驶等领域的广泛应用,越来越多视频不仅用于人类观看,还可直接由机器视觉算法进行自动分析。如何高效地面向机器视觉存储和传输此类视频成为新的挑战。然而,现有的视频编码标准,如最新的多功能视频编码(Versatile Video Coding,VVC/H.266),主要针对人眼视觉特性进行优化,未能充分考虑压缩对机器视觉任务的性能影响。为解决这一问题,本文以多目标跟踪作为典型的机器视觉视频处理任务,提出一种面向机器视觉的VVC帧内编码算法。首先,使用神经网络可解释性方法,梯度加权类激活映射(Gradient-weighted Class Activation Mapping,GradCAM++),对视频内容进行显著性分析,定位出机器视觉任务所关注的区域,并以显著图的形式表示。随后,为了突出视频画面中的关键边缘轮廓信息,本文引入边缘检测并将其结果与显著性分析结果进行融合,得到最终的机器视觉显著性图。最后,基于融合后的机器视觉显著性图改进VVC模式选择过程,优化VVC中的块划分和帧内预测的模式决策过程。通过引入机器视觉失真,代替原有的信号失真来调整率失真优化公式,使得编码器在压缩过程中尽可能保留对视觉任务更为相关的信息。实验结果表明,与VVC基准相比,所提出方法在保持相同机器视觉检测精度的同时,可节约12.7%的码率。Recently,the proliferation of computer vision applications in areas such as intelligent surveillance,autonomous driving,and robotics has resulted in a surge in the volume of video data.These videos are increasingly processed and analyzed by intelligent algorithms,rather than being solely consumed by humans.Consequently,efficient storage and transmission of video data for machine vision tasks have become the new challenges.The latest video coding standard,Versatile Video Coding(VVC or H.266),represents the state-of-the-art in video compression for human viewers.It aims to provide better quality at lower bitrates by optimizing for the characteristics of the human visual system.However,VVC does not account for the specific requirements of machine vision tasks,which leads to critical information loss during compression.Consequently,the performance of machine vision algorithms may degrade significantly when working with compressed video.This gap indicates the need for a specialized video coding approach that considers the unique requirements of machine vision.To address this problem,this paper proposes a novel VVC intra-coding scheme that optimizes VVC specifically for machine vision tasks.Our approach takes multiple object tracking,a common task in machine vision,as a typical example to demonstrate the effectiveness of the proposed solution.First,the proposed scheme begins by analyzing the video content using a neural network interpretability method known as Gradientweighted Class Activation Mapping(GradCAM++).This method is typically used to highlight areas of an image that are most relevant to the decision-making process of a neural network.By applying GradCAM++to the video frames,we generate saliency maps that reveal the regions of interest for machine vision.Subsequently,to highlight the critical edge contour information in the frame,this paper introduces edge detection and fuses it with the saliency analysis results to obtain the final machine vision saliency map.Finally,the process of VVC mode selection is impr
关 键 词:机器视觉编码 显著性分析 帧内编码 多功能视频编码
分 类 号:TP37[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15