互编码器辅助视频的多模态场景分类

Multimodal scene classification for encoder-assisted videos

作　　者：黄天阳侯元波李圣辰邵曦[1] HUANG Tianyang;HOU Yuanbo;LI Shengchen;SHAO Xi(School of Communications and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Information Technology,Ghent University,Gent 9000,Belgium;School of Advanced Technology,Xi􀆳an Jiaotong-Liverpool University,Suzhou 215123,China)

机构地区：[1]南京邮电大学通信与信息工程学院,江苏南京210003 [2]根特大学信息技术学院,比利时根特9000 [3]西交利物浦大学智能工程学院,江苏苏州215123

出　　处：《南京邮电大学学报（自然科学版）》2023年第1期104-110,共7页Journal of Nanjing University of Posts and Telecommunications：Natural Science Edition

基　　金：国家科技创新2030—“新一代人工智能”重大项目(2020AAA0106200);国家自然科学基金(61936005,61872199,61872424)资助项目。

摘　　要：为了解决多模态场景分类准确率不高的问题,文中提出一种由互编码器辅助视频的多模态场景分类方法。音频部分首先对输入音频数据进行特征提取并且使用自注意力机制取得关注信息,图像部分首先对视频进行分帧图片提取,然后通过ResNet50网络进行特征提取,随后提取到的双模态信息进入互编码器,互编码器通过提取各个模态隐层特征进行特征融合,融合后的新特征结合attention机制辅助视频特征。在该模型中,互编码器为融合特征的辅助系统。实验基于DCASE2021 Challenge Task 1B数据集进行验证,结果表明互编码器能够提升分类准确率。Given the low accuracy of multi-modal scene classification, this paper proposes a multi-modal scene classification method assisted by mutual coders. First, The audio part extracts the features of the input audio data and uses the self-attention mechanism to obtain the attention information. The image part extracts the frame images of the video, and then extracts the features through the ResNet50. Second, the extracted dual-mode information is entered into the mutual encoder. The mutual encoder performs feature fusion by extracting the hidden layer features of each mode. The new features after fusion are combined with the attention mechanism to assist the video features. In this model, the mutual coder is an auxiliary system for feature fusion. The experiment is conducted on the DCASE2021 Challenge Task 1B dataset, and the results show that the mutual encoder can improve the classification accuracy.

关键词：视听场景分类自注意力机制多模态学习编码器变分自编码器

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

互编码器辅助视频的多模态场景分类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

互编码器辅助视频的多模态场景分类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索