多声学特征融合的语音自动剪辑深度学习模型  

Deep Learning Model Fusing Multiple Acoustic Features for Voice Automatic Editing

在线阅读下载全文

作  者:刘臣[1] 倪仁倢 周立欣[1] 侯昌佑 LIU Chen;NI Ren-jie;ZHOU Li-xin;HOU Chang-you(Business School,University of Shanghai for Science and Technology,Shanghai 200093,China;Shanghai Media Group,Shanghai 200125,China)

机构地区:[1]上海理工大学管理学院,上海200093 [2]上海广播电视台,上海200125

出  处:《小型微型计算机系统》2023年第8期1713-1719,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金面上项目(71774111)资助;中国博士后科学基金第69批面上项目(2021M692135)资助。

摘  要:剪辑是音视频制作中的重要环节,剪辑师需综合考虑剪辑节奏、关联性等要素,耗费大量人力和时间.从剪辑特性和实际应用出发,本文提出了一种多声学特征融合的语音自动剪辑深度学习模型(CNN-BiGRU),它可以识别媒体中的语音部分并进行艺术化的自动剪辑.模型提取了对数梅尔频谱、短时能量和短时过零率3种特征,通过多个卷积神经网络融合后输入双向门控循环神经网络.采用基于课程式学习的方式,使用先大后小的数据形式将模型训练至最佳.实验结果表明相较于传统机器学习剪辑模型,本模型能更有效地结合整体与局部的信息进行剪辑,且具有更强的鲁棒性.模型在CHiME-5测试集上的准确率高达98.36%,与人工剪辑结果十分接近且大幅缩短剪辑耗时.Editing is an important part of media production.Editors must synthetically consider many factors such as editing rhythm,relevance,etcetera,which consumes much labor power and time.This paper proposes a deep learning model(CNN-BiGRU)fusing multiple acoustic features for voice automatic editing,according to the characteristics and application of editing.It can identify the voice section in media and then artistically and automatically edits them.The model extracts three features that are mel spectrogram,short-term energy and short-term zero-crossing rate.After that,it uses convolutional neural networks(CNN)to fuse them.Then,it in-puts the results of CNN onto the bidirectional gated recurrent neural networks(Bi-GRU).This paper optimizes model by curriculum learning that uses large types of data firstly and small one next.The experimental results show that our model can combine more macro and micro information for editing than classic machine learning models.Furthermore,our model also has more robustness.The model's accuracy on CHiME-5 test set achieves 98.36%.The model's editing results are extremely similar to manual outcomes while shrink-ing editing period greatly.

关 键 词:语音剪辑 声学特征融合 课程式学习 双向门控循环神经网络 卷积神经网络 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象