基于GRU与自注意力网络的声源到达方向估计  被引量:1

Sound Source Arrival Direction Estimation Based on GRU and Self-attentive Network

在线阅读下载全文

作  者:何儒汉 陈一帆[1,2] 余永升 姜艾森[4] HE Ruhan;CHEN Yifan;YU Yongsheng;JIANG Aisen(Hubei Provincial Engineering Research Center for Intelligent Textile and Fashion,Wuhan 430200,China;School of Computer Science and Artificial Intelligence,Wuhan Textile University,Wuhan 430200,China;State Key Laboratory of Silicate Materials for Architectures Wuhan University of Technology,Wuhan 430070,China;Science and Technology Institute,Wuhan Textile University,Wuhan 430200,China)

机构地区:[1]纺织服装智能化湖北省工程研究中心,武汉430200 [2]武汉纺织大学计算机与人工智能学院,武汉430200 [3]武汉理工大学硅酸盐建筑材料国家重点实验室,武汉430070 [4]武汉纺织大学技术研究院,武汉430200

出  处:《计算机科学》2023年第S02期986-992,共7页Computer Science

基  金:国家自然科学基金面上项目(61170093)。

摘  要:基于神经网络的声源定位近年来受到广泛的关注,但如何缓解隐含DOA位置信息丢失、小样本数据等问题仍然是目前面临的挑战,因此提出了一种基于GRU和自注意力网络的声源到达方向估计方法。该方法采用对小型数据集效果较好的GRU作为骨干网络,弥补了纯净的声音数据采集困难的问题;同时,该方法使用多声道录音的声源形成训练集,经过短时傅里叶变换特征提取得到梅尔频谱图和声学强度矢量,进而形成由多通道语谱图以及归一化的主特征向量叠加的输入特征,避免了对语谱图与GCC-PHAT特征结合的隐式DOA信息的破坏,有效缓解了隐含DOA位置信息丢失问题;将其作为输入进入卷积循环神经网络模型进行监督学习获得模型参数。模型输出使用三维笛卡尔积坐标回归获得DOA位置估计,并增加自注意力网络在模型训练时进行参数回传,使得网络在训练的同时计算损失并预测关联矩阵,以解决预测定位和参考定位之间的最优分配。实验结果表明,该网络在不同混响条件和信噪比的环境下,均具有较高的定位准确率和鲁棒性。Neural network-based sound source localization has received wide attention in recent years.However,it is still challenging to mitigate the problems such as loss of implied DOA location information and small sample data.Therefore,a sound source arrival direction estimation method based on GRU and self-attentive network is proposed.The method uses GRU,which works well for small data sets,as the backbone network to compensate for the difficulty of pure sound data collection.At the same time,it uses sound sources from multichannel recordings to form a training set.After the short-time Fourier transform feature extraction to obtain the Meier spectrogram and acoustic intensity vector,then form the input features superimposed by the multi-channel speech spectrogram and the normalized main feature vector.Avoiding the implicit DOA information corrupted by the combination of speech spectrogram and GCC-PHAT features,effectively mitigating the loss of implicit DOA location information.It is used as input into the convolutional recurrent neural network model for supervised learning to obtain the model parameters.The model output uses 3D Cartesian product coordinate regression to obtain DOA location estimates,and adds a self-attentive network for parameter back-propagation during model training,enables the network to calculate the loss and predict the correlation matrix while training to solve the optimal allocation between predicted and reference localization.Experimental results show that the network has high localization accuracy and robustness under different reverberation conditions and signal-to-noise ratios.

关 键 词:声源到达方向估计 GRU 卷积神经网络 循环神经网络 自注意力 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象