基于对比性视觉-文本模型的光场图像质量评估  

Quality Assessment of Light Field Images Based on Contrastive Visual-Textual Model

在线阅读下载全文

作  者:王汉灵 柯逍[1,3,4] 江澳鑫 郭文忠 WANG Han-ling;KE Xiao;JIANG Ao-xin;GUO Wen-zhong(College of Computer and Data Science,Fuzhou University,Fuzhou,Fujian 350116,China;Key Laboratory of Earthquake Engineering and Engineering Vibration,Institute of Engineering Mechanics,China Earthquake Administration,Harbin,Heilongjiang 150080,China;Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing,Fuzhou University,Fuzhou,Fujian 350116,China;Engineering Research Center of Big Data Intelligence,Ministry of Education,Fuzhou,Fujian 350116,China)

机构地区:[1]福州大学计算机与大数据学院,福建福州350116 [2]中国地震局工程力学研究所地震工程与工程振动重点实验室,黑龙江哈尔滨150080 [3]福建省网络计算与智能信息处理重点实验室,福建福州350116 [4]大数据智能教育部工程研究中心,福建福州350116

出  处:《电子学报》2024年第10期3562-3577,共16页Acta Electronica Sinica

基  金:国家重点研发计划(No.2021YFB3600503);国家自然科学基金(No.61972097,No.U21A20472);福建省科技重大专项(No.2021HZ022007);福建省自然科学基金(No.2021J01612)。

摘  要:光场图像作为一种能够捕获场景每个位置光线信息的图像类型,在电子成像、医学影像和虚拟现实等领域具有广泛的应用前景.光场图像质量评估(Light Field Image Quality Assessment,LFIQA)旨在衡量此类图像的质量,但当前方法面临视觉效果与文本模态间异构性的重要挑战.为解决上述问题,本文提出了一种基于文本-视觉的多模态光场图像质量评估模型.具体来说,在视觉模态方面,我们设计了多任务模型,结合边缘自动阈值算法有效丰富了光场图像的关键表示特征.在文本模态方面,基于输入噪声特征与预测噪声特征的对比,准确识别光场图像的噪声类别,并验证了噪声预测对优化视觉表示的重要性.基于上述研究,进一步提出了一种优化的通用噪声文本配置方法,并结合边缘增强策略,显著提升了基线模型在光场图像质量评估中的准确性和泛化能力.此外,通过消融实验,评估了各组件对整体模型性能的贡献,验证了本文方法的有效性和稳健性.实验结果表明,该方法不仅在公开数据集Win5-LID和NBU-LF1.0的实验中表现出色,还在融合数据集中展示出优秀的实验结果,与现有最优算法相比,本文所提方法在两个数据库中的性能分别提升了2%和6%.本文提出的噪声验证策略和配置方法不仅为图像质量评估中的噪声预测任务提供了有价值的参考,也可用于其它噪声预测类型的辅助任务.Light field imaging,as an image type capable of capturing light information from every position in a scene,holds broad application prospects in fields such as electronic imaging,medical imaging,and virtual reality.Light field image quality assessment(LFIQA)aims to measure the quality of such images,yet current methods confront significant challenges arising from the heterogeneity between visual effects and textual modalities.To address these issues,this paper proposes a multi-modal light field image quality assessment model grounded in text-vision integration.Specifically,for the visual modality,we devise a multi-task model that effectively enriches the crucial representational features of light field images by incorporating an edge auto-thresholding algorithm.On the textual side,we accurately identify noise categories in light field images based on the comparison between input noise features and predicted noise features,thereby validating the importance of noise prediction in optimizing visual representations.Building upon these findings,we further introduce an optimized universal noise text configuration approach combined with an edge enhancement strategy,which notably enhances the accuracy and generalization capabilities of the baseline model in LFIQA.Additionally,ablation experiments are conducted to assess the contribution of each component to the overall model performance,thereby verifying the effectiveness and robustness of our proposed method.Experimental results demonstrate that our approach not only excels in tests on public datasets like Win5-LID and NBU-LF1.0 but also shows remarkable outcomes in fused datasets.Compared to the state-ofthe-art algorithms,our method achieves performance improvements of 2%and 6%respectively on the two databases.The noise verification strategy and configuration method presented in this paper not only provide valuable insights for light field noise prediction tasks but can also be applied as auxiliary tools for other noise prediction types.

关 键 词:图像质量评估 光场图像 视觉-文本模型 多任务模式 噪声预测 图像增强 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象