基于顶层反馈和联合检测的主旋律提取算法  被引量:2

Main melody extraction algorithm based on top-level feedback and joint detection

在线阅读下载全文

作  者:倪嘉惠 金文清 黄荣 韩芳[1,2] NI Jiahui;JIN Wenqing;HUANG Rong;HAN Fang(College of Information Science and Technology,Donghua University,Shanghai 201620,China;Engineering Research Center of Digitized Textile&Apparel Technology of Ministry of Education(Donghua University),Shanghai 201620,China)

机构地区:[1]东华大学信息科学与技术学院,上海201620 [2]数字化纺织服装技术教育部工程研究中心(东华大学),上海201620

出  处:《计算机应用》2021年第S02期103-107,共5页journal of Computer Applications

基  金:国家自然科学基金资助项目(11572084,11972115)。

摘  要:在音乐信息检索领域,主旋律提取是一项较为困难的工作,因为在人声帧和无人声帧的交界处附近,卷积神经网络-条件随机场(CNN-CRF)模型无法很好地处理两者之间的关系。针对这个问题,提出一种基于顶层反馈和联合检测的主旋律提取算法(CRNN-TFB)。通过联合检测的方法可以解决语音检测和音高分类双目标问题。联合检测可以看作是多任务学习,但与一般的多任务网络不同,在主旋律提取网络顶部还添加了反馈网络来增强歌声检测和利用歌声结果来强化旋律特征中的音乐信息。在数据集MIR-1K和MIREX05中,CRNN-TFB总体性能上均优于对比算法。实验结果表明,CRNN-TFB能有效降低八度错误率,并且在召回率(VR)、原始音高准确率(RPA)和整体准确率(OA)上明显优于对比方法。In the field of Music Information Retrieval(MIR),the main melody extraction is a relatively difficult task,because it is difficult for the CNN-CRF(Convolutional Neural Network-Conditional Random Field)model to handle the relationship between the human voice frame and the non-human voice frame near the junction. To solve this problem,a theme extraction algorithm based on top-level feedback and joint detection named CRNN-TFB(Convolutional Recurrent Neural Network-Top-level FeedBack)was proposed. The joint detection method can solve the dual target problem of voice detection and pitch classification. Joint detection can be regarded as multi-task learning,but unlike general multi-task networks,a feedback network was added to the top of the main melody extraction network to enhance singing voice detection and singing voice results were used to enhance the music information in melody features. On the data sets MIR-1 K and MIREX05,the overall performance of CRNN-TFB is better than that of the existing algorithms. Experimental results show that CRNN-TFB can effectively reduce the octave error rate,and is significantly better than the comparison methods in Voicing Recall rate(VR),Raw Pitch Accuracy(RPA)and Overall Accuracy(OA).

关 键 词:主旋律提取 音乐信息检索 卷积神经网络 特征提取 音乐信号处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象