基于多模态注意力网络的红外人体行为识别方法  被引量:1

Infrared Human Action Recognition Method Based on Multimodal Attention Network

在线阅读下载全文

作  者:汪超[1] 唐超[1] 王文剑[2] 张靖 WANG Chao;TANG Chao;WANG Wenjian;ZHANG Jing(School of Artificial Intelligence and Big Data,Hefei University,Hefei 230601,China;School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Science Island Branch of Graduate School,University of Science and Technology of China,Hefei 230031,China)

机构地区:[1]合肥学院人工智能与大数据学院,合肥230601 [2]山西大学计算机与信息技术学院,太原030006 [3]中国科学技术大学研究生院科学岛分院,合肥230031

出  处:《计算机科学》2024年第8期232-241,共10页Computer Science

基  金:国家自然科学基金(62076154,U21A20513);安徽省自然科学基金(2008085MF202);合肥学院科研项目(22050123010);安徽省研究生学术创新项目(2022xscx145);安徽省大学生创新创业训练计划项目(1602582519599861760)。

摘  要:深度学习网络对红外单一模态数据的学习表征能力具有一定的局限性,针对该问题,文中提出了基于多模态注意力网络的红外人体行为识别方法。由于深度学习网络模型无法直接对视频信息进行训练和分类,首先,通过预处理模块将得到的视频信息预处理成红外视图,再将得到的红外视图通过Sobel算子和基于L 1范数的全变分光流法分别提取红外视图的边缘信息和光流信息得到边缘视图和光流视图;其次,将红外视图、边缘视图、光流视图分别输入融合注意力机制模块的三流网络中进行特征学习;然后,对三流网络中每个网络提取的多模态特征进行融合;最后,将融合得到的特征向量输入随机森林进行训练和分类。在公开数据集NTU RGB+D和自建数据集上进行实验,结果表明了所提方法具有不错的识别效果。Human behavior recognition has become one of the research hotspots in the field of machine vision and pattern recognition,and has important research value.Many intelligent services require rapid and accurate recognition of human behavior.Human behavior recognition has important research significance and wide application value in fields such as intelligent monitoring and smart home,and has been widely studied by scholars at home and abroad.Human behavior recognition usually uses visible light video data,but visible light videos are easily affected by light and cannot adapt to nighttime recognition.Due to the characteristics of infrared information such as being less affected by light and protecting privacy,human behavior recognition methods based on infrared video have great significance.Deep learning network has some limitations on the learning and representation ability of infrared single mode data.To solve the above problems,an infrared human behavior recognition method based on multimodal attention network is proposed.Because the deep learning network model cannot directly train and classify the video information,first,the preprocessing module preprocesses the video information obtained into infrared views,and then extracts the edge information and optical flow information of the infrared view through Sobel operator and L 1 norm based total variation optical flow method to obtain the edge view and optical flow view respectively.Secondly,input the infrared view,edge view,and optical flow view into the three stream network fused with the attention mechanism module for feature learning.Then,fuse the multimodal features extracted from each network in the three stream network.Finally,the fusion feature vector is input to random forest for training and classification.Experimental results on the public dataset NTU RGB+D and the self-built dataset indicate that the proposed me-thod has good recognition performance.In the future,we will consider expanding our method to more datasets to verify its effectiveness.

关 键 词:多模态 注意力机制 三流网络 特征融合 随机森林 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象