检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广州理工学院智能制造与电气工程学院,广东 广州
出 处:《仪器与设备》2024年第3期315-328,共14页Instrumentation and Equipments
摘 要:多模态语音分离方法融合视觉和听觉信息,提高单一听觉模态的分离性能。目前视听融合机制在模态特征尺度差异的问题上研究不足,影响视觉的高维语义信息表达和分离性能。因此,提出一种基于视觉模态尺度的融合方法,通过编码器降低听觉时序尺度并重建出包含视觉模态信息的语音特征。针对主流的分离基线模型,引入双尺度扩张卷积融合的时序卷积块,学习特征的多维信息,进一步提高语音分离方法的性能。在GRID数据集和TUT2016数据集上对提出的多模态语音分离方法进行评估。实验结果表明,与单模态基线模型和视听语音分离比较模型相比,分别提高了2.14 dB和0.82 dB,验证了所提方法的有效性。最后基于可解释性分析理论,将主干网络对分离性能的影响可视化,为后续结构设计和语音分离可解释性提供理论依据。The multimodal speech separation method integrates visual and auditory information to improve the separation performance of a single auditory mode. At present, the problem of modal feature scale difference in audiovisual fusion mechanism is insufficient, which affects the expression and separation performance of high-dimensional semantic information in vision. Therefore, a fusion method based on visual modal scale is proposed to reduce auditory timing scale and reconstruct speech features containing visual modal information by encoder. Aiming at the mainstream separation baseline model, a two-scale extended convolution fusion temporal convolution block is introduced to learn the multi-dimensional information of features, and the performance of speech separation method is further improved. The proposed multimodal speech separation method is evaluated on GRID dataset and TUT2016 dataset. The experimental results show that, the performance of the method is improved by 2.14 dB and 0.82 dB, respectively, compared with the single mode baseline model and the audio-visual speech separation comparison model, which verifies the effectiveness of the proposed method. Finally, based on the interpretability analysis theory, the influence of backbone network on the separation performance is visualized, which provides a theoretical basis for the subsequent structural design and the interpretability of speech separation.
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7