引入非局部模块卷积神经网络的基频提取模型被引量：3

Fundamental Frequency Extraction Model Using Convolutional Neural Networks with Non-local Modules

作　　者：刘晶晶黄浩[1] LIU Jingjng;HUANG Hao(School of Information Science and Engineering,Xinjiang University,Urumqi 830017,China)

机构地区：[1]新疆大学信息科学与工程学院,乌鲁木齐830017

出　　处：《计算机工程》2023年第3期128-133,160,共7页Computer Engineering

基　　金：国家重点研发计划(2020AAA0107902);国家自然科学基金(61663044,61761041);新疆多语种信息技术重点实验室开放课题(2020D04047)。

摘　　要：基频或基音的估计是各种语音信号处理技术的关键子问题,现有信号处理技术研究多使用数据驱动的方法,即通过卷积神经网络进行基频提取。然而,卷积神经网络中的卷积操作一次只能处理局部的音频样本点,只有在递归应用卷积操作时才能捕获全局音频样本点依赖关系,导致计算效率低与优化困难。受非局部模块在计算机视觉任务中具有较高性能的启发,提出一种具有非局部模块的卷积神经网络用于基频提取任务。非局部模块相比不断堆叠的卷积神经网络,可以直接计算两个位置之间的关系,由于其可以忽略欧氏距离,因此能够快速捕获长范围的依赖关系。对于基频估计任务,可在卷积神经网络中加入非局部模块以计算音频样本点之间的相似性,有助于捕获帧与帧和样本点与样本点之间的全局依赖关系,且非局部模块可以保持输入输出维度不变,能够快速地集成卷积神经网络。实验结果表明,该方法平均绝对误差仅为4.7,与基线模型相比,至少降低了0.7,能够获得最佳的模型性能。Estimating the fundamental frequency or pitch is a key sub-problem in various speech signal processing techniques.Recent studies use a data-driven approach,namely,fundamental frequency extraction with Convolutional Neural Network(CNN).However,the convolution operation in CNN can only process local audio sample points at a given time,and the global audio sample point dependencies can only be captured when the convolution operation is applied recursively.However,this introduces computational inefficiency and optimization difficulties.Inspired by the impressive performance of non-local modules in many computer vision tasks,this study proposes a CNN with non-local modules to undertake the fundamental frequency extraction task.Compared with the continuously stacked CNN,CNN with non-local modules can effectively obtain the relationship between two positions,that is,they can quickly capture long-range dependencies because they ignore the Euclidean distance.In the pitch estimation task,when non-local modules are added to CNNs to calculate the similarity between all audio sample points in each frame,they help capture the global dependencies between frame-to-frame and sample-to-sample with slightly increased computational complexity.Moreover,non-local modules do not alter the input and output dimensions;thus,they can be easily integrated with CNN.The experimental results demonstrate that the Mean Absolute Error(MAE)of the proposed method is only 4.7,which is at least 0.7 lower than that of the baseline model,and state-of-the-art performance is obtained.

关键词：基频语音信号处理数据驱动卷积神经网络非局部模块

分类号：TP183[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

引入非局部模块卷积神经网络的基频提取模型被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

引入非局部模块卷积神经网络的基频提取模型 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

引入非局部模块卷积神经网络的基频提取模型被引量：3