检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:任凯龙 汪毅[1] 陈晓冬[1] 蔡怀宇[1] Ren Kailong;Wang Yi;Chen Xiaodong;Cai Huaiyu(School of Pvecision Instruments and Optoelectronwics Eagineering,Tianjin University,Tianjiu 300072,China)
机构地区:[1]天津大学精密仪器与光电子工程学院,天津300072
出 处:《激光与光电子学进展》2020年第18期374-382,共9页Laser & Optoelectronics Progress
摘 要:提出了一种基于融合i-vector特征的长短时记忆(LSTM)循环神经网络模型,用于腹腔镜扶持器语音控制,在小训练样本下实现对特定医生语音中的短时、孤立词指令的识别。该模型以LSTM循环神经网络作为基础模型,以梅尔频率倒谱系数(MFCC)作为输入特征参数,将i-vector特征作为LSTM循环神经网络的深层输入信息,与神经网络中LSTM层后的深层特征信息进行拼接,达到参数融合的目的,实现对特定主刀医生语音指令的准确识别以及对非主刀医生语音指令的拒识别,为腹腔镜操作提供安全智能的语音识别方案。使用自建语音库进行实验,分别验证所提算法对训练库内语音的识别性能以及对训练库外语音的拒识别性能。实验结果表明:与动态时间规整算法(DTW)和混合高斯模型-隐马尔可夫模型(GMM-HMM)相比,所提模型在对训练库内特定人语音指令识别正确率高达99.6%的同时保持着错误接受率为0%,对训练库外语音的平均错误接受率为2.5%,满足腹腔镜扶持器控制的准确性和安全性要求。A long short-term memory(LSTM)recurrent neural network based on an i-vector feature is presented for speech control of laparoscopic supporter to realize short-term isolated word command recognition from the speech of a specific doctor using small training samples.In this model,LSTM recurrent neural network is used as the basic model,Mel-frequency cepstrum coefficient(MFCC)is used as the input characteristic parameter,i-vector feature is used as the deep input information of LSTM recurrent neural network,and the deep feature information behind LSTM layer in the neural network is spliced to achieve the purpose of parameter fusion,so as to realize the accurate recognition of the voice instructions of the specific surgeon and the rejection recognition of the voice instructions of the non surgeon.This approach offers a secure and intelligent speech recognition scheme for laparoscopic surgeries.Further,a self-built speech database is used as a training library to verify speech recognition performance of the proposed algorithm as well as its rejection performance for the speech not included in the training library.Experiments show that compared with dynamic time warping(DTW)and Gaussian mixture model-Hidden Markov model(GMM-HMM),the proposed model exhibits a 99.6%correct recognition rate for voice commands of specific people recorded in the training library while maintaining a false acceptance rate of 0%,with an average false acceptance rate of 2.5%for voices not included in the training library.The proposed model meets the requirements of accuracy and safety expected by laparoscopic supporter control standards.
关 键 词:医用光学 腹腔镜 i-vector 长短时记忆 特定人语音识别
分 类 号:TN912[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49