检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Rui Zhang Ruijie Meng Jinqiu Sang Yi Hu Xiaodong Li Chengshi Zheng
机构地区:[1]Key Laboratory of Noise and Vibration Research,Institute of Acoustics,Chinese Academy of Sciences,Beijing,China [2]University of Chinese Academy of Sciences,Beijing,China [3]Shanghai Institute of AI for Education,East China Normal University,Shanghai,China [4]Department of Electrical Engineering and Computer Science,University of Wisconsin–Milwaukee,Milwaukee,Wisconsin,USA
出 处:《CAAI Transactions on Intelligence Technology》2023年第2期364-378,共15页智能技术学报(英文)
基 金:National Key Research&Development,R&D Program of China,Grant/Award Number:2021YFB3201702;National Natural Science Foundation of China,Grant/Award Number:12074403。
摘 要:The head-related transfer function(HRTF)plays a vital role in immersive virtual reality and augmented reality technologies,especially in spatial audio synthesis for binaural reproduction.This article proposes a deep learning method with generic HRTF amplitudes and anthropometric parameters as input features for individual HRTF generation.By designing fully convolutional neural networks,the key anthropometric parameters and the generic HRTF amplitudes were used to predict each individual HRTF amplitude spectrum in the full-space directions,and the interaural time delay(ITD)was predicted by the transformer module.In the amplitude prediction model,the attention mechanism was adopted to better capture the relationship of HRTF amplitude spectra at two distinctive directions with large angle differences in space.Finally,with the minimum phase model,the predicted amplitude spectrum and ITDs were used to obtain a set of individual head-related impulse responses.Besides the separate training of the HRTF amplitude and ITD generation models,their joint training was also considered and evaluated.The root-mean-square error and the log-spectral distortion were selected as objective measurement metrics to evaluate the performance.Subjective experiments further showed that the auditory source localisation performance of the proposed method was better than other methods in most cases.
关 键 词:audio databases augmented reality deep learning MULTIMEDIA
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.142.200.134