基于深度学习的对话重叠语音片段检测  被引量:2

Detecting Overlapping Speech Segments in Conversations Using Deep Learning

在线阅读下载全文

作  者:魏金太 高穹 WEI Jin-tai;GAO Qiong(Department of Information and Art Design, Henan Forestry Vocational College, Luoyang 471002, China;Luoyang Electronic Equipment Testing Center, Luoyang 471003, China)

机构地区:[1]河南林业职业学院信息与艺术设计系,河南洛阳471002 [2]中国洛阳电子装备试验中心,河南洛阳471003

出  处:《中北大学学报(自然科学版)》2021年第1期34-39,共6页Journal of North University of China(Natural Science Edition)

基  金:国家自然科学基金资助项目(11404398);河南科技厅重点攻关项目(142102210097)。

摘  要:为改进在真实对话中分割重叠语音的自然事件,训练一个深度卷积神经网络(DCNN),使用来自单声道音频的级别相对较低的对数标度梅尔频谱图进行端到端的学习.使用Fisher英语语料库的真实会话数据正确训练DCNN,同时保持并测试其对普通会话场景的普遍性.为了缓解严重的类失衡,在训练集中采取消除静音,并在训练过程中对占比重较多的类进行统一随机抽样.同时,使用维特比算法执行时间平滑以增强最终分割.在超过91 h的对话中,检测精度超过60%,召回率超过29%,证明了深度学习对于这项任务的适用性.To improve the segmenting natural occurrences of overlapping speech in real conversations,a ceep convolutional neural network(DCNN)was trained.It used relatively low-level log-scaled Mel-spectrograms from mono-aural audio to end-to-end learning.The DCNN was properly trained by using the real conversational data from the Fisher English Corpus while maintaining and testing its generalizability to real conversational scenarios.To alleviate the imposed challenge of severe class-imbalance,the silence was removed from the training objective and the majority class was randomly sampled during training.Simultaneously,using the Viterbi algorithm to perform temporal smoothing which enhanced the final segmentation.Over 60%precision and over 29%recall rate in over 91 h of conversations demonstrate the applicability of deep learning to this task.

关 键 词:重叠语音 深度卷积神经网络 对话分析 语音分割 类失衡 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象