检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:殷志鹏 徐新洲 YIN Zhipeng;XU Xinzhou(School of Internet of Things,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
机构地区:[1]南京邮电大学物联网学院,江苏南京210003
出 处:《中北大学学报(自然科学版)》2025年第1期19-26,共8页Journal of North University of China(Natural Science Edition)
基 金:中国博士后科学基金面上项目(2022M711693);国家自然科学基金面上项目(62071242,62172235);南京邮电大学校级自然科学基金(NY222158)。
摘 要:口吃语音分类旨在利用语音信号对不同口吃类别进行分类识别,而现有相关研究没有充分考虑自监督预训练模型表示嵌入的时序特性,且只简单地表征了口吃语音数据的类别不平衡性。为此,本文提出一种基于自监督预训练模型和非线性加权交叉熵(NWCE)损失的口吃语音分类方法。该方法首先利用自监督预训练模型提取副语言表示嵌入,然后通过带自注意力机制的双向长短期记忆网络模型,捕捉嵌入中显著的时序特征和上下文信息,最后利用非线性加权交叉熵损失来关注样本较少的口吃语音类别。在口吃语音分类数据集上的实验结果表明,本文方法通过学习语音中自监督预训练模型多层表示嵌入的时序信息,并且通过NWCE充分描述了各口吃类别数据间的关系,取得了比现有方法更好的口吃语音分类性能。Stuttering speech classification aims to classify and recognize different categories of stuttering using spoken signals.Nevertheless,the existing related works fail to sufficiently focus on sequential characteristics for the representation embedding of self-supervised pre-trained models,and these works also simplistically address the class-imbalance issue for stuttering-speech data.In this regard,we proposed a stuttering speech classification approach based on self-supervised pre-trained models and nonlinear weighted cross-entropy(NWCE)loss.Within the proposed approach,we first employed a selfsupervised pre-trained model to extract paralinguistic representation embeddings from stuttering speech.Then,we utilized a bidirectional long short-term memory network model with a self-attention mechanism to capture essential temporal features and contextual information within the embeddings.Afterwards,a nonlinear weighted cross-entropy loss was performed to focus on stuttering speech categories with fewer samples.The experimental results on stuttering speech classification dataset indicate that,the proposed approach achieves better performance for classifying stuttering speech compared with state-of-the-art approaches,through learning the sequential information from self-supervised pre-trained models’multilayer representation embedding in speech,and sufficiently describes the relationship between the data of different stuttering categories by using NWCE.
关 键 词:计算副语言 口吃语音分类 自监督预训练模型 非线性加权交叉熵损失
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.100.57