基于注意力机制和深度先验的注视点检测网络

Gaze Target Detection Network Based on Attention Mechanism and Depth Prior

作　　者：朱芸朱冬晨张广慧孙彦赞[1] 张晓林[2,3,4,5] ZHU Yun;ZHU Dongchen;ZHANG Guanghui;SUN Yanzan;ZHANG Xiaolin(School of Communication and Information Engineering,Shanghai University,Shanghai 200444,China;Bionic Vision System Laboratory,Shanghai Institute of Microsystem and Information Technology,Chinese Academy of Sciences,Shanghai 200050,China;University of Science and Technology of China,Hefei 230026,China;ShanghaiTech University,Shanghai 201210,China;Xiong’an Institute of Innovation,Chinese Academy of Sciences,Xiong’an,Hebei 071702,China)

机构地区：[1]上海大学通信与信息工程学院,上海200444 [2]中国科学院上海微系统与信息技术研究所仿生视觉系统实验室,上海200050 [3]中国科学技术大学,合肥230026 [4]上海科技大学,上海201210 [5]中国科学院雄安创新研究院,河北雄安071702

出　　处：《计算机工程与应用》2024年第14期240-249,共10页Computer Engineering and Applications

基　　金：上海市“脑与类脑智能基础转化应用研究”市级重大科技专项(2018SHZDZX01)。

摘　　要：人类注视行为作为一种非语言线索,对揭示人类意图起着重要作用,注视点检测在机器视觉领域已引起广泛关注。然而,现有方法多聚焦于图像的纹理信息提取,忽视了立体深度信息对注视点估计的重要性,难以应对纹理复杂场景。对此,提出了一种新的基于注意力机制和深度先验的注视点检测网络,包括面部视线方向预测与场景显著性检测两个阶段。在视线方向预测阶段,建立通道-空间注意力机制模块以重校准纹理特征,并设计头部位置编码分支,实现纹理和头部位置感知增强的高表征特征,以准确预测视线方向。进一步,提出将表征三维场景中立体或距离信息的深度作为先验引入到显著性检测阶段的策略,同时通过通道-空间注意力机制增强多尺度纹理特征,充分发挥深度几何信息和图像纹理信息的优势,提高注视点检测的准确性。实验结果表明,在两个权威数据集GazeFollow和DLGaze上与其他先进方法相比,该模型表现出显著的优越性。Human gaze behavior,as a non-verbal cue,plays a crucial role in revealing human intentions.Gaze target detection has attracted extensive attention from the machine vision community.However,existing gaze target detection methods usually focus on the texture information extraction of images,ignoring the importance of stereo depth information for gaze target detection,which makes it difficult to deal with scenes with complex texture.In this work,a novel gaze target detection network based on attention mechanism and depth prior is proposed,which adopts two-stage architecture(i.e.,a gaze direction prediction stage and a saliency detection stage).In the gaze direction predication stage,a channel-spatial attention mechanism module is established to recalibrate texture features,and a head position encoding branch is designed to achieve texture and head position-aware enhanced high-representation features to accurately predict gaze.Furthermore,a strategy is proposed to introduce the depth representing the stereoscopic or distance information in the 3D scene as a prior into the saliency detection stage.At the same time,the channel-spatial attention mechanism is used to enhance the multi-scale texture features,and the advantages of depth geometric information and image texture information are fully utilized to improve the accuracy of gaze target detection.Experimental results show that the proposed model performs favorably against the state-of-the-art methods on GazeFollow and DLGaze datasets.

关键词：注视点检测注意力机制深度先验特征融合神经网络

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于注意力机制和深度先验的注视点检测网络

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于注意力机制和深度先验的注视点检测网络

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索