基于时域的基频感知语音分离方法  被引量:2

Time Domain Speech Separation Using Auxiliary Pitch Information

在线阅读下载全文

作  者:王凯 李鸣鹤 黄志华[1] 黄浩[1] WANG Kai;LI Minghe;HUANG Zhihua;HUANG Hao(School of Information Science and Engineering,Xinjiang University,Urumqi Xinjiang 830017,China)

机构地区:[1]新疆大学信息科学与工程学院,新疆乌鲁木齐830017

出  处:《新疆大学学报(自然科学版)(中英文)》2022年第2期182-188,共7页Journal of Xinjiang University(Natural Science Edition in Chinese and English)

基  金:新疆多语种信息技术重点实验室开放课题(2020D04047);国家重点研发项目(2020AAA0107902);国家自然科学基金项目(61663044,61761041)。

摘  要:传统的单通道语音分离方法主要采用混音作为输入,对其进行分离得到目标说话人的语音.最近的研究表明,将预估计的基频信息注入到原始混音信号中能够提高分离效果,但这种方法最初应用于时频域.近年来,基于时域的语音分离方法已经被验证优于早期的时频域分离方法.基于上述出发点,本文提出基于辅助基频的时域语音分离方法.该方法首先将时域信号输入预分离模块生成预分离语音,并从预分离语音中提取基频;然后将提取的基频与原始混音拼接,作为后分离模块的输入进行第二次分离.本文评估了不同的基频提取方法和训练策略.语音分离实验结果表明:在训练后分离模块时,先使用理想基频与混音融合训练一个理想分离网络,然后用RAPT方法对预分离源提取估计基频注入混音,再进行理想分离网络的微调,能够获得最佳的语音分离性能,比Conv-TasNet基线方法提高了0.5 dB.这说明显式地注入辅助基频信息不仅在时频域语音分离中表现出了有效性,同时也适用于时域语音分离.In most speech separation methods, only the mixture is used as the input. Pitch-aware architecture injects pitch information into the original mixture to improve the separation result, which was originally applied in time-frequency(T-F) domain. Based on the fact that speech separation in time domain has achieved much better performance than that in T-F domain, we investigate into the effectiveness on the utilization of auxiliary pitch information in time domain speech separation. Firstly, a pre-separation module is trained to generate pre-separated sources, from which pitches are extracted. The extracted pitches are then spliced with the original mixture as the input to a post-separation module. We evaluate different pitch trackers and training strategies. It is shown that,for training the post-separation module, the combination of pre-training on ideal pitches and then fine-tuning on estimated pitches extracted from pre-separated sources using RAPT gives the best result, achieving 0.5 dB improvement over the Conv-TasNet baseline. This indicates that the auxiliary pitch information which has shown effectiveness in T-F domain speech separation is also applicable to time domain speech separation.

关 键 词:语音分离 单通道 基频 时域 

分 类 号:TN912.3[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象