基于改进STAM的语音端点检测算法  

Voice activity detection algorithm based on improved STAM

作  者:吴荣波 周斌[1,2] 胡波 WU Rongbo;ZHOU Bin;HU Bo(South-Central Minzu University,College of Computer Science,Wuhan 430074,China;South-Central Minzu University,Key Laboratory of Information-Physics-Fusionbased Intelligent Computing of the National Ethnic Affairs Commission of the People′s Republic of China,Wuhan 430074,China;Wuhan Dongxin Tongbang Information Technology Co.,Ltd.,Wuhan 430074,China)

机构地区:[1]中南民族大学计算机科学学院,武汉430074 [2]中南民族大学国家民委信息物理融合智能计算重点实验室,武汉430074 [3]武汉东信同邦信息技术有限公司,武汉430074

出  处:《中南民族大学学报(自然科学版)》2025年第3期384-392,共9页Journal of South-Central Minzu University(Natural Science Edition)

基  金:中南民族大学中央高校基本科研业务费专项资金资助(CZY23006);湖北省技术创新专项基金资助项目(2019ADC071)。

摘  要:在低信噪比的背景下,由于背景噪声干扰信号特征,存在语言端点检测误判和漏判的风险.现有的解决方法存在易受干扰、精度有限、鲁棒性差等问题.针对上述问题,对STAM进行优化,提出了一种改进的语音端点检测算法Inception-ResNet STAM(IR-STAM).该算法通过改用音频指纹(AFP)特征来取代传统的Log-Mel特征,实现了对音频信号更深层次的特征提取;对频率注意力模块的卷积方式进行改进,采用深度可分离卷积,有效降低了模型的参数量;加入Inception-ResNet模块,进一步增强了模型对不同尺度特征的捕捉和分析能力.实验结果表明:在TIMIT测试集上,IR-STAM相较于STAM,模型的参数量降低150 k,并且在不同信噪比环境下F1分数均提高了0.5以上.In low Signal-to-Noise Ratio(SNR)scenarios,voice activity detection is impeded by background noise that disrupts signal characteristics,leading to the risks of false and missed detections.Existing solutions are prone to interference,have limited accuracy,and lack robustness.To tackle these challenges,an enhanced version of the voice activity detection Model(STAM)has been developed,named the Inception-ResNet STAM(IR-STAM).The algorithm facilitates more profound feature extraction from audio signals by substituting traditional Log-Mel features with Audio Fingerprint(AFP)features.The convolution method within the frequency attention module is enhanced through the use of depthwise separable convolution,significantly reducing the model′s parameter count.Furthermore,the integration of an Inception-ResNet module bolsters the model′s capacity to detect and analyze features across various scales.The experimental results show that on the TIMIT test set,IR-STAM has reduced the model′s parameter count by 150 k compared to STAM and has achieved an increase of more than 0.5 in the F1 score across various Signal-to-Noise Ratio conditions.

关 键 词:低信噪比 Inception-ResNet模块 音频指纹特征 语音端点检测 

分 类 号:O625.67[理学—有机化学] O643.3[理学—化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象