检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王一帆 张雪芳 WANG Yifan;ZHANG Xuefang(Wuhan Research Institute of Posts and Telecommunications,Wuhan 430070,China)
出 处:《计算机科学》2024年第S01期489-493,共5页Computer Science
基 金:国家重点研发计划(2019YFB1803600)。
摘 要:尽管过往人工智能相关技术在众多领域取得了成功,但是通常只是模拟了人类的某一种感知能力,也就意味着被限制在处理单个模态的信息之中。从多个模态信息中提取特征并进行有效融合对于从弱/限制领域人工智能向强/通用人工智能的发展迈进具有重要意义。本研究基于编码器-解码器结构,在视频分类任务上对多模态信息的特征编码进行早期特征融合、对各模态信息的预测结果进行后期决策融合以及对两者相结合的不同多模态信息融合策略进行了对比研究;同时对音频模态信息参与模态融合的两种方式进行了对比,即直接将音频进行特征编码进而参与模态融合或音频通过语音转文本进而以文本的形式参与模态融合。实验结果表明,将文本和音频模态单独的预测结果与另外两种模态的融合特征的预测结果进行决策融合能够进一步提高分类预测准确率;此外,通过语音识别将语音转换成文本模态信息,能够更加充分利用其中包含的语义信息。Despite the success of AI-related technologies in many fields,they usually simulate only one type of human perception,which means that they are limited to process information from a single modality.Extracting features from multiple modal information and fusing them effectively is important for developing general AI.In this paper,a comparative study of different multimodal information fusion strategies based on an encoder-decoder architecture with early feature fusion for feature encoding of multimodal information,late decision fusion for prediction results of each modal information,and a combination of both is conducted on a video classification task.This paper also compares two ways to involve audio modal information in modal fusion,i.e.,directly encoding audio with features and then participating in modal fusion or audio by speech-to-text and then participating in modal fusion in the form of text.Experiments show that decision fusion of the prediction results of text and audio modalities alone with those of the fused features of the other two modalities can further improve the classification prediction accuracy under the experimental approach of this study.Moreover,converting speech into text modal information by ASR(Automatic Speech Recognition)can make fuller use of the semantic information contained in it.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:52.14.165.32