检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马亚彤 王松[1,2] 刘英芳 MA Yatong;WANG Song;LIU Yingfang(School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China;Gansu Provincial Engineering Research Center for Artificial Intelligence and Graphic and Imaging Processing,Lanzhou 730070,China)
机构地区:[1]兰州交通大学电子与信息工程学院,兰州730070 [2]甘肃省人工智能与图形图像处理工程研究中心,兰州730070
出 处:《计算机工程》2022年第9期180-188,共9页Computer Engineering
基 金:国家自然科学基金(62067006);甘肃省自然科学基金(21JR7RA291);甘肃省教育科技创新项目(2021jyjbgs-05);甘肃省高校产业支撑计划项目(2020C-19)。
摘 要:基于多模态融合的人体动作识别技术被广泛研究与应用,其中基于特征级或决策级的融合是在单一级别阶段下进行的,无法将真正的语义信息从数据映射到分类器。提出一种多级多模态融合的人体动作识别方法,使其更适应实际的应用场景。在输入端将深度数据转换为深度运动投影图,并将惯性数据转换成信号图像,通过局部三值模式分别对深度运动图和信号图像进行处理,使每个输入模态进一步转化为多模态。将所有的模态通过卷积神经网络训练进行提取特征,并把提取到的特征通过判别相关分析进行特征级融合。利用判别相关分析最大限度地提高两个特征集中对应特征的相关性,同时消除每个特征集中不同类之间的特征相关性,将融合后的特征作为多类支持向量机的输入进行人体动作识别。在UTD-MHAD和UTD Kinect V2 MHAD两个多模态数据集上的实验结果表明,多级多模态融合框架在两个数据集上的识别精度分别达到99.8%和99.9%,具有较高的识别准确率。Human action recognition technology based on multimodal fusion has been widely investigated. In this technology,feature-level or decision-level fusion is performed at a single level or stage,where actual semantic information from data cannot be mapped for classification. Hence,this paper proposes a multilevel multimodal fusion human action recognition method that is adaptable to practical application scenarios.First,depth data are converted into Depth Motion Maps(DMM),and the inertial data into signal images at the input end.Subsequently,each input mode is rendered multimodal by processing the depth motion maps and signal image via the Local Ternary Patterns(LTP)mode.Next,all the modalities are trained to extract features by a convolutional neural network,and the extracted features are fused at the feature level via Discriminant Correlation Analysis(DCA),which maximizes the correlation of corresponding features in two feature sets while eliminating feature correlation between different classes in each feature set. Finally,the fused features are used as input to a multiclass support vector machine for human action recognition.Experiments are conducted on two multimodal datasets,UTD-MHAD and UTD Kinect V2 MHAD. The experimental results show that the recognition accuracy of the proposed multilevel multimodal fusion framework is 99.8% and 99.9%on the abovementioned two datasets,respectively,both of which signify high recognition accuracy.
关 键 词:人体动作识别 深度运动图 惯性传感器 局部三值模式 判别相关分析
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145