检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:史久琛 孙美君[2] 王征[2] 张冬[3] Shi Jiuchen;Sun Meijun;Wang Zheng;Zhang Dong(School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China;College of Intelligence and Computing,Tianjin University,Tianjin 300072,China;Institute of Chinese Medicine,Tianjin University of Traditional Chinese Medicine,Tianjin 300193,China)
机构地区:[1]上海交通大学电子信息与电气工程学院,上海,200240 [2]天津大学智能与计算学部,天津300072 [3]天津中医药大学中医药研究院,天津300193
出 处:《天津大学学报(自然科学与工程技术版)》2019年第10期1062-1068,共7页Journal of Tianjin University:Science and Technology
基 金:天津市教委科研计划资助项目(2017KJ151)~~
摘 要:视频人眼关注预测是在视频中标注能够吸引人眼关注的感兴趣显著区域,对于自动提取大量视频的语义信息有着重要的应用.该研究从目前显著性处理主流算法全卷积网络的局限性出发,提出了一种基于时间-空间特征的深度学习模型用于预测视频中的人眼关注区域.首先,采用全卷积网络提取视频帧图像的空间特征,光流方法用于提取相邻帧之间的时间运动特征,通过长短期记忆网络处理当前帧与其前 6 帧的空间特征与时间特征,得到最终的人眼关注区域预测图.使用INB 和IVB 两个人眼关注视频数据库进行计算.实验结果表明,在地球移动距离、受试者工作特征曲线下面积、标准化扫描路径显著性、线性相关性等 4 个性能评估标准分别取得了 0.375 1、0.818 6、2.024 1、0.745 7 和 0.413 7、0.785 6、1.964 5、0.734 9 的结果,预测性能优于 5 种对比算法,表明本文方法在视频人眼关注预测上能够取得较准确的结果.Video eye fixation prediction is to mark the area of interest in the video which can attract the eyes attention. It is an important application for automatic extraction of semantic information of a large number of videos. Based on the limitation of the full convolutional network used now,this study proposes a deep learning model based on spatial-temporal features to predict the eye fixation in video. Firstly,the full convolutional network is used to extract the spatial features of video frame images. The optical flow method is used to extract temporal motion characteristics between adjacent frames,through the long short term memory network to deal with the current frame and the first six frames of the spatial and temporal features,the final eye fixation prediction map can be captured. INB and IVB video databases are used to evaluate the model. The experimental results show that the four performance evaluation criteria such as the earth mover’s distance,area under receiver operating characteristic,normalized scanpath saliency and linear correlation coefficient are respectively obtained,it is 0.375 1,0.818 6,2.024 1,0.745 7 and 0.413 7, 0.785 6,1.964 5,0.734 9. And the prediction performance is better than the five contrastive algorithms,indicating that the proposed method can get more accurate results in predicting the video eye fixation.
关 键 词:视频 人眼关注 时空特征 全卷积网络 光流 长短期记忆网络
分 类 号:TP37[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.57