基于自监督预训练模型和NWCE的口吃语音分类

Stuttering Speech Classification Based on Self‑Supervised Pre‑Trained Model and NWCE

作　　者：殷志鹏徐新洲 YIN Zhipeng;XU Xinzhou(School of Internet of Things,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)

机构地区：[1]南京邮电大学物联网学院,江苏南京210003

出　　处：《中北大学学报(自然科学版)》2025年第1期19-26,共8页Journal of North University of China(Natural Science Edition)

基　　金：中国博士后科学基金面上项目(2022M711693);国家自然科学基金面上项目(62071242,62172235);南京邮电大学校级自然科学基金(NY222158)。

摘　　要：口吃语音分类旨在利用语音信号对不同口吃类别进行分类识别,而现有相关研究没有充分考虑自监督预训练模型表示嵌入的时序特性,且只简单地表征了口吃语音数据的类别不平衡性。为此,本文提出一种基于自监督预训练模型和非线性加权交叉熵(NWCE)损失的口吃语音分类方法。该方法首先利用自监督预训练模型提取副语言表示嵌入,然后通过带自注意力机制的双向长短期记忆网络模型,捕捉嵌入中显著的时序特征和上下文信息,最后利用非线性加权交叉熵损失来关注样本较少的口吃语音类别。在口吃语音分类数据集上的实验结果表明,本文方法通过学习语音中自监督预训练模型多层表示嵌入的时序信息,并且通过NWCE充分描述了各口吃类别数据间的关系,取得了比现有方法更好的口吃语音分类性能。Stuttering speech classification aims to classify and recognize different categories of stuttering using spoken signals.Nevertheless,the existing related works fail to sufficiently focus on sequential characteristics for the representation embedding of self-supervised pre-trained models,and these works also simplistically address the class-imbalance issue for stuttering-speech data.In this regard,we proposed a stuttering speech classification approach based on self-supervised pre-trained models and nonlinear weighted cross-entropy(NWCE)loss.Within the proposed approach,we first employed a selfsupervised pre-trained model to extract paralinguistic representation embeddings from stuttering speech.Then,we utilized a bidirectional long short-term memory network model with a self-attention mechanism to capture essential temporal features and contextual information within the embeddings.Afterwards,a nonlinear weighted cross-entropy loss was performed to focus on stuttering speech categories with fewer samples.The experimental results on stuttering speech classification dataset indicate that,the proposed approach achieves better performance for classifying stuttering speech compared with state-of-the-art approaches,through learning the sequential information from self-supervised pre-trained models’multilayer representation embedding in speech,and sufficiently describes the relationship between the data of different stuttering categories by using NWCE.

关键词：计算副语言口吃语音分类自监督预训练模型非线性加权交叉熵损失

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于自监督预训练模型和NWCE的口吃语音分类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于自监督预训练模型和NWCE的口吃语音分类

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索