基于VQ-VAE与Do-Conv层的无监督语音表示学习  

Unsupervised Speech Presentation Learning Based on VQ-VAE Model and Do-Conv Layer

在线阅读下载全文

作  者:刘雪鹏 张文林[1] 陈紫龙 LIU Xuepeng;ZHANG Wenlin;CHEN Zilong(Information Engineering University,Zhengzhou 450001,China)

机构地区:[1]信息工程大学,河南郑州450001

出  处:《信息工程大学学报》2022年第5期513-519,共7页Journal of Information Engineering University

基  金:国家自然科学基金资助项目(61673395,62171470)。

摘  要:针对在无监督条件下,对语音信号提取语音表示的问题,提出了Do-VQVAE模型。提出的Do-VQVAE模型主要基于矢量量化变分自编码器的结构进行实现,并在此基础上,引入深度方向超参数化卷积层构成编码器。该模型通过编码器-解码器的结构,以无监督的方式提取语音信号的特征,将编码器的输出通过码书的映射进行量化,得到离散的语音表示。在实验过程中还引入了互信息神经估计,旨在提高学习到的语音表示的说话人不变性。提出的模型在ZeroSpeech 2019挑战的数据集上进行了训练和测试,经过测试,模型的ABX错误率相比于基线和卷积VQ-VAE模型都有明显降低,并取得了与最好系统相媲美的结果。To address speech representations extraction from speech signals under unsupervised conditions, this paper proposes the Do-VQVAE model. This model is mainly implemented based on the structure of the vector quantized variational autoencoder, and on this basis, a depthwise over-parameterized convolution layer is introduced to form the encoder. The model extracts the features of the speech signal in an unsupervised manner through the encoder-decoder structure, and quantizes the output of the encoder through the mapping of the codebook to obtain a discrete speech representation. Mutual information neural estimation is also introduced during the experiments to improve the speaker invariance of the learned speech representations. The proposed model is trained and tested on the dataset of the ZeroSpeech 2019 challenge. After testing, the ABX error rate of the proposed model is significantly reduced compared to both the baseline and the convoluted VQ-VAE model and achieves results comparable to the best systems.

关 键 词:语音表示 无监督 声学单元发现 ZeroSpeech挑战 

分 类 号:TN912.34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象