基于轻量型空间特征编码网络的驾驶人注视区域估计算法  被引量:2

Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network

在线阅读下载全文

作  者:张名芳 李桂林 吴初娜[2] 王力[1] 佟良昊 ZHANG Mingfang;LI Guilin;WU Chuna;WANG Li;TONG Lianghao(Beijing Key Laboratory of Urban Road Intelligent Traffic Control Technology,North China University of Technology,Beijing 100144,China;Key Laboratory of Operation Safety Technology on Transport Vehicles,Research Institute of Highway,Ministry of Transport,Beijing 100088,China)

机构地区:[1]北方工业大学城市道路交通智能控制技术北京市重点实验室,北京100144 [2]交通运输部公路科学研究院,运输车辆运行安全技术交通运输行业重点实验室,北京100088

出  处:《清华大学学报(自然科学版)》2024年第1期44-54,共11页Journal of Tsinghua University(Science and Technology)

基  金:国家自然科学基金资助项目(51905007);北京市教育委员会科学研究计划项目(KM202210009013)。

摘  要:实时监测驾驶人注视区有助于人机共驾汽车理解并判断驾驶人的意图。针对车载环境下算法精度和实时性难以平衡的问题,提出了一种基于轻量型空间特征编码网络(lightweight spatial feature encoding network, LSFENet)的驾驶人注视区估计算法。通过人脸对齐和眼镜移除步骤对采集的驾驶人上半身图像序列进行预处理,得到左右眼图像和人脸关键点坐标;在MobileNetV2的基础上构建基于GCSbottleneck模块的LSFENet特征提取网络,集成注意力机制模块增强关键特征权重,生成左右两眼特征;利用Kronecker积融合眼部与人脸关键点特征,将连续帧图像融合后的特征输入循环神经网络中,得到该图像序列的注视区域估计结果;利用公开数据集和自制数据集对新算法进行测试。实验结果表明:LSFENet算法的注视区估计准确率可达97.08%,每秒能检测约103帧图像,满足车载环境下运算效率和精度需求;LSFENet算法对注视区1、 2、 3、 4、 9的估计准确率均在85%以上,且对不同光照条件和眼镜遮挡情况均具有较强的适应能力。研究结果对驾驶人视觉分心状态识别具有重要意义。[Objective]The real-time monitoring of a driver's gaze region is essential for human-machine shared driving vehicles to understand and predict the driver's intentions.Because of the limited computational resources and storage capacity of in-vehicle platforms,existing gaze region estimation algorithms often hardly balance accuracy and real-time performance and ignore temporal information.[Methods]Therefore,this paper proposes a lightweight spatial feature encoding network(LSFENet)for driver gaze region estimation.First,the image sequence of the driver's upper body is captured by an RGB camera.Image preprocessing steps,including face alignment and glasses removal,are performed to obtain left-and right-eye images and facial keypoint coordinates to handle challenges such as cluttered backgrounds and facial occlusions in the captured images.Face alignment is conducted using the multi-task cascaded convolutional network algorithm,and the glasses are removed using the cycle-consistent adversarial network algorithm.Second,we build the LSFENet feature extraction network based on the GCSbottleneck module to improve the MobileNetV2 architecture,since the inverted residual structure in the MobileNetV2 network requires a significant amount of memory and floating-point operations and ignores the redundancy and the correlation among the feature maps.We embed a ghost module to improve memory consumption and integrate the channel and spatial attention modules to extract the cross-channel and spatial information from the feature map.Next,the Kronecker product is used to fuse eye features with facial keypoint features to reduce the impact of the information complexity imbalance.Then,the fused features from the images at continuous frames are input into a recurrent neural network to estimate the gaze zone of the image sequence.Finally,the proposed network is evaluated using the public driver gaze in the wild(DGW)dataset and a self-collected dataset.The evaluation metrics include the number of parameters,the floating-point operations

关 键 词:注视区域估计 轻量型空间特征编码网络 注意力机制 特征提取 KRONECKER积 循环神经网络 

分 类 号:U495[交通运输工程—交通运输规划与管理]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象