检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘雪鹏 张文林[1] 陈紫龙 LIU Xuepeng;ZHANG Wenlin;CHEN Zilong(Information Engineering University,Zhengzhou 450001,China)
机构地区:[1]信息工程大学,河南郑州450001
出 处:《信息工程大学学报》2022年第5期513-519,共7页Journal of Information Engineering University
基 金:国家自然科学基金资助项目(61673395,62171470)。
摘 要:针对在无监督条件下,对语音信号提取语音表示的问题,提出了Do-VQVAE模型。提出的Do-VQVAE模型主要基于矢量量化变分自编码器的结构进行实现,并在此基础上,引入深度方向超参数化卷积层构成编码器。该模型通过编码器-解码器的结构,以无监督的方式提取语音信号的特征,将编码器的输出通过码书的映射进行量化,得到离散的语音表示。在实验过程中还引入了互信息神经估计,旨在提高学习到的语音表示的说话人不变性。提出的模型在ZeroSpeech 2019挑战的数据集上进行了训练和测试,经过测试,模型的ABX错误率相比于基线和卷积VQ-VAE模型都有明显降低,并取得了与最好系统相媲美的结果。To address speech representations extraction from speech signals under unsupervised conditions, this paper proposes the Do-VQVAE model. This model is mainly implemented based on the structure of the vector quantized variational autoencoder, and on this basis, a depthwise over-parameterized convolution layer is introduced to form the encoder. The model extracts the features of the speech signal in an unsupervised manner through the encoder-decoder structure, and quantizes the output of the encoder through the mapping of the codebook to obtain a discrete speech representation. Mutual information neural estimation is also introduced during the experiments to improve the speaker invariance of the learned speech representations. The proposed model is trained and tested on the dataset of the ZeroSpeech 2019 challenge. After testing, the ABX error rate of the proposed model is significantly reduced compared to both the baseline and the convoluted VQ-VAE model and achieves results comparable to the best systems.
关 键 词:语音表示 无监督 声学单元发现 ZeroSpeech挑战
分 类 号:TN912.34[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.157