检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:ZHENG Changyan YANG Jibin ZHANG Xiongwei SUN Meng
机构地区:[1]High-tech Institute,Fan Gong-ting South Street on the 12^(th),Qingzhou 262500 [2]Army Engineering University,Nanjing 210007
出 处:《Chinese Journal of Acoustics》2022年第1期1-19,共19页声学学报(英文版)
基 金:supported by the National Natural Science Foundation of China(62071484,61471394);NSF of Jiangsu Province for Excellent Young Scholars(BK20180080)。
摘 要:Compared with phase spectrum,magnitude spectrum can represent most speech information,hence many speech processing tasks pay much attention on manipulating magnitude spectrum and use the imperfect vocoder parameters or mismatched phase spectrum to synthesize the waveform,which leads to an obvious distortion of speech quality.To address this problem,a modified version of Wave Net model fused with phase information is proposed to synthesize the speech with higher quality.In the Wave Net model,the original or processed phase spectrum of speech and the enhanced magnitude spectrum are concatenated as the condition input,and then the predicted speech waveform is generated directly from this input,which is a kind of fusion feature.The proposed method can realize the effective utilization of the phase information and is verified in two tasks including voice conversion(VC)and bone-conducted speech enhancement(BSE).Two kinds of phase spectrum,the modified group delay(MGD)spectrum and the instantaneous frequency deviation spectrum,are compared comprehensively in the simulation experiments,and the influence of the fusion feature on the bandwidth extension Wave Net model and the teacher-student Wave Net model is also explored.In VC experiments,the A/B test shows the generated speech using the teacher-student Wave Net model is much better than using the STRAIGHT vocoder.In BSE experiments,the results show that,using the bandwidth extension Wave Net model via the feature fused with MGD spectrum,the mean opinion score(MOS)of the enhanced speech increases by 54.3%compared with the original bone-conducted speech.All the results demonstrate that the phase-fused condition input can supplement single magnitude spectrum efficiently and help the Wave Net vocoder achieve promising improvement on the quality of the synthesized speech.
分 类 号:TN912.3[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.112