基于ResUnet和TFGAN网络的激光麦克风语音增强方法  被引量:3

Speech enhancement method of laser microphone based on ResUnet and TFGAN network

在线阅读下载全文

作  者:代欣学 范松涛[1] 周燕[1,2] Dai Xinxue;Fan Songtao;Zhou Yan(Optoelectronics System Laboratory,Institute of Semiconductors,Chinese Academy of Sciences,Beijing 100083,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院半导体研究所光电系统实验室,北京100083 [2]中国科学院大学,北京100049

出  处:《红外与激光工程》2023年第10期73-83,共11页Infrared and Laser Engineering

摘  要:激光麦克风是一种利用光学多普勒效应获取远场语音信息的技术,其语音质量受到探测系统自身特性、光探测路径以及目标物等多个方面的影响。为了从远距离声场下的目标物获取更高质量的语音信息,文中通过单频声激励实验获得了4种典型目标物(A4纸片、A4纸盒、瓦楞盒、塑料瓶)的声致振动频率响应,发现了其在频率上的非均匀性。在此基础上,提出了一种基于ResUnet和TFGAN网络的激光语音增强方法,其通过ResUnet网络预测去噪梅尔谱图,并利用TFGAN网络由预测的梅尔谱图恢复出激光语音的时域波形。然后,利用实验室自制的激光麦克风在4种目标物上进行了远距离语音采集实验,采用文中提出的方法对采集到的激光麦克风语音进行了处理,并与非线性函数谐波重构法、DNN+谐波重构法进行了比较。最后利用客观语音质量评估(PESQ)和时域分段信噪比(SNRseg)对处理后的激光语音进行了量化评估。实验结果表明,在4种目标物上采集到的激光语音,经过非线性函数谐波重构方法和DNN+谐波重构方法处理后,语音质量均无明显提升,其相应的PESQ和SNRseg分值无明显提高。而经过文中所提的ResUnet+TFGAN网络方法处理后,激光语音取得了更高的PESQ和SNRseg分值,语音质量明显提升。因此,文中提出的方法在激光麦克风应用中具有更好的激光语音增强效果。此外,由实验结果可知,此方法在频率响应一致性较差的目标物上,仍然可以较好地重建频谱,恢复出高质量的语音信息。Objective Laser microphone is a kind of equipment which employs optical Doppler effect to acquire acoustic vibration information(speech).Compared with conventional microphones,laser microphones have the characteristics of extended range,high precision and non-contact.It is capable of collecting distant sound field information in a directional fashion while avoiding interference from the sound field close to the equipment.However,when the laser microphone is used to collect the remote sound field speech information,the quality of the obtained speech is affected by many factors,which leads to the severe decline of the laser speech quality.At present,the research of speech enhancement algorithm for laser microphone speech is relatively preliminary.The traditional single-channel speech enhancement method requires the signal and noise to satisfy the conditions of stationarity or correlation,and its performance is significantly reduced under complex conditions such as low signal-to-noise ratio and non-stationarity noise.The method based on deep neural network can understand the complex mapping relationship between noisy speech and clear speech,and the performance is better than the traditional method.This technique,however,has poor generalizability for laser speech from complex targets in unpreset environments because different targets have different frequency response characteristics.Therefore,in order to increase the quality of far-field speech captured by laser microphones,a laser microphone speech enhancement method based on ResUnet network and TFGAN network is proposed in this paper.Methods Using laboratory-made laser microphones,four different types of objects were used in this paper's remote speech acquisition tests(Fig.6).The technique described in this paper is used to process the recorded speech,and it is contrasted with methods for nonlinear function harmonic reconstruction and DNN+harmonic reconstruction(Fig.9).Finally,objective speech quality assessment(PESQ)and time-domain segmented signalto-noise ratio(S

关 键 词:外差干涉 语音增强 神经网络 声致振动 

分 类 号:O439[机械工程—光学工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象