检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张婷婷 邱泽鹏 赵腊生 毛嘉莹 Zhang Tingting;Qiu Zepeng;Zhao Lasheng;Mao Jiaying(Key Laboratory of Advanced Design&Intelligent Computing Ministry of Education,Dalian University,Dalian Liaoning 116622,China)
机构地区:[1]大连大学先进设计与智能计算省部共建教育部重点实验室,辽宁大连116622
出 处:《计算机应用研究》2024年第12期3658-3663,共6页Application Research of Computers
基 金:辽宁省教育厅基本科研资助项目(LJKMZ20221838);“111”计划资助项目(D23006);大连市科技创新基金计划资助项目(2023JJ11CG002)。
摘 要:现实生活中的噪声会对语音时域和频域信息产生干扰,导致语音关键词识别模型在噪声环境下准确率下降。针对此问题,提出了双分支融合单元,其中时域分支与频域分支以并行的方式提取时域特征和频域特征,降低了串行堆叠时域卷积和频域卷积所带来的信息损耗;随后通过交叉融合的方式加强模型对时频信息的感知,进一步增强了模型特征表达能力。同时提出了时频压缩激励模块,通过对时域与频域中信息的重要性分布建模,可以为模型提供选择性关注有价值片段的能力,进一步提高了模型鲁棒性。在Google Command v2-12数据集上,相比于对比模型,所提模型在不同信噪比的测试中取得了更高的识别准确率,且参数量更低;对于训练阶段未涵盖的信噪比条件,所提模型在测试中展现出更高的泛化性。实验结果表明,本文模型在识别准确率和参数量方面更具优势,具有更好的噪声鲁棒性。In real-life scenarios,noise interferes with the temporal-frequency information of speech,leading to a decrease in the accuracy of keyword spotting models in noisy environments.To address this issue,this paper proposed a dual-branch fusion unit,which the temporal branch and the frequency branch extracted temporal and frequency features in parallel to reduce the information loss caused by serially stacking temporal and frequency convolutions.Cross-fusion enhanced the model’s perception of temporal and frequency information,thereby it strengthened the model’s feature representation capability.Additionally,this paper proposed a temporal-frequency squeeze and excitation module,which modeled the importance distribution of information in the temporal and frequency domains,enabling the model to selectively focus on valuable segments and further improved its robustness.Experimental results demonstrated that on the Google Command v2-12 dataset,the proposed model achieved higher recognition accuracy in tests with different signal-to-noise ratios compared to contrast models,while having a lower parameter count.Furthermore,the proposed model generalized better during testing for signal-to-noise ratio conditions that were not included during training.Experimental results show that the proposed model has advantages in recognition accuracy and parameter quantity,and has better noise robustness.
关 键 词:关键词识别 双分支融合 时频压缩激励 鲁棒性模型 注意力机制
分 类 号:TP391.42[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.36.157