融合多重注意力机制的人眼注视点预测  

Eye fixation prediction combining with multiple attention mechanism

在线阅读下载全文

作  者:孔力 胡学敏 汪顶 刘艳芳 张龑 陈龙[2] Kong Li;Hu Xuemin;Wang Ding;Liu Yanfang;Zhang Yan;Chen Long(School of Computer Science and Information Engineering,Hubei University,Wuhan 430062,China;School of Data and Computer Science,Sun Yat-sen University,Guangzhou 510006,China)

机构地区:[1]湖北大学计算机与信息工程学院,武汉430062 [2]中山大学数据科学与计算机学院,广州510006

出  处:《中国图象图形学报》2022年第12期3503-3515,共13页Journal of Image and Graphics

基  金:国家自然科学基金项目(62273135,61806076);湖北省自然科学基金项目(2021CFB460);湖北省技术创新专项重大项目(2019ACA144)。

摘  要:目的经典的人眼注视点预测模型通常采用跳跃连接的方式融合高、低层次特征,容易导致不同层级之间特征的重要性难以权衡,且没有考虑人眼在观察图像时偏向中心区域的问题。对此,本文提出一种融合注意力机制的图像特征提取方法,并利用高斯学习模块对提取的特征进行优化,提高了人眼注视点预测的精度。方法提出一种新的基于多重注意力机制(multiple attention mechanism,MAM)的人眼注视点预测模型,综合利用3种不同的注意力机制,对添加空洞卷积的ResNet-50模型提取的特征信息分别在空间、通道和层级上进行加权。该网络主要由特征提取模块、多重注意力模块和高斯学习优化模块组成。其中,空洞卷积能够有效获取不同大小的感受野信息,保证特征图分辨率大小的不变性;多重注意力模块旨在自动优化获得的低层丰富的细节信息和高层的全局语义信息,并充分提取特征图通道和空间信息,防止过度依赖模型中的高层特征;高斯学习模块用来自动选择合适的高斯模糊核来模糊显著性图像,解决人眼观察图像时的中心偏置问题。结果在公开数据集SALICON(saliency in context)上的实验表明,提出的方法相较于同结构的SAM-Res(saliency attention modal)模型以及DINet(dilated inception network)模型在相对熵(Kullback-Leibler divergence,KLD)、sAUC(shuffled area under ROC curve)和信息增益(information gain,IG)评价标准上分别提高了33%、0.3%和6%;53%、0.5%和192%。结论实验结果表明,提出的人眼注视点预测模型能通过加权的方式分别提取空间、通道、层之间的特征,在多数人眼注视点预测指标上超过了主流模型。Objective Human eye fixation recognition has been developing in images-related computer vision in recent years.The distinctive salient regions of an image are selected for capturing visual structure better.Recent saliency models are developed through salient object detection,object segmentation and image cropping.Traditional applications are focused on hand-crafted features based on low-level cues(e.g.,contrast,texture,color)for saliency prediction.However,these features are easily failed to simulate the complex activation of the human visual system,especially in complex scenarios.Existing eye fixation prediction models often use jump connections to fuse high-level and low-level features,which easily leads to the difficulty of weighing the importance of features between different levels,and the gazing problem are biased toward the center.Commonly,humans are inclined to look at the center of the image when there are no obvious salient regions.We develop layer attention mechanism that different weights are assigned to different layer features for selective layer features extraction,and the channel attention mechanism and spatial attention mechanism are integrated to selectively extract different channel and spatial features in convolutional features.In addition,we facilitate a method of Gaussian learning to solve the problem of the center priors and improve the prediction accuracy.Method Our eye fixation prediction model is based on multiple attention mechanism network(MAM-Net),which uses three different attention mechanisms to weight the feature information of different layers,different channels,and different image pixels extracted by the ResNet-50 model with dilated convolution.Our network is mainly composed of the feature extraction module,the novel multiple attention mechanism(MAM)module,and the Gaussian learning optimization module.1)A dilated convolution network is used to capture long-range information via extracting local and global feature maps,which can contain a lot of different receptive fields.2)A MAM

关 键 词:人眼注视点预测 多重注意力 层注意力 通道注意力 空间注意力 高斯学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象