检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:魏金太 高穹 WEI Jin-tai;GAO Qiong(Department of Information and Art Design, Henan Forestry Vocational College, Luoyang 471002, China;Luoyang Electronic Equipment Testing Center, Luoyang 471003, China)
机构地区:[1]河南林业职业学院信息与艺术设计系,河南洛阳471002 [2]中国洛阳电子装备试验中心,河南洛阳471003
出 处:《中北大学学报(自然科学版)》2021年第1期34-39,共6页Journal of North University of China(Natural Science Edition)
基 金:国家自然科学基金资助项目(11404398);河南科技厅重点攻关项目(142102210097)。
摘 要:为改进在真实对话中分割重叠语音的自然事件,训练一个深度卷积神经网络(DCNN),使用来自单声道音频的级别相对较低的对数标度梅尔频谱图进行端到端的学习.使用Fisher英语语料库的真实会话数据正确训练DCNN,同时保持并测试其对普通会话场景的普遍性.为了缓解严重的类失衡,在训练集中采取消除静音,并在训练过程中对占比重较多的类进行统一随机抽样.同时,使用维特比算法执行时间平滑以增强最终分割.在超过91 h的对话中,检测精度超过60%,召回率超过29%,证明了深度学习对于这项任务的适用性.To improve the segmenting natural occurrences of overlapping speech in real conversations,a ceep convolutional neural network(DCNN)was trained.It used relatively low-level log-scaled Mel-spectrograms from mono-aural audio to end-to-end learning.The DCNN was properly trained by using the real conversational data from the Fisher English Corpus while maintaining and testing its generalizability to real conversational scenarios.To alleviate the imposed challenge of severe class-imbalance,the silence was removed from the training objective and the majority class was randomly sampled during training.Simultaneously,using the Viterbi algorithm to perform temporal smoothing which enhanced the final segmentation.Over 60%precision and over 29%recall rate in over 91 h of conversations demonstrate the applicability of deep learning to this task.
关 键 词:重叠语音 深度卷积神经网络 对话分析 语音分割 类失衡
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7