检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张晋宁 ZHANG Jinning(Shanxi Institute of Mechanical&Electrical Engineering,Changzhi 046000,China)
出 处:《电声技术》2023年第11期101-104,共4页Audio Engineering
摘 要:视觉语音识别(Audio-Visual Speech Recognition,AVSR)系统结合音频和视觉信息,提供可靠的语音识别功能。为了提高AVSR系统在低信噪比(Signal-to-Noise Ratio,SNR)环境下的识别准确率,提出一种基于循环神经网络(Recurrent Neural Network,RNN)的AVSR系统。该系统由音频特征提取模块、视觉特征提取模块以及音频和视觉特征联合模块3部分组成。特征联合模块利用RNN将基于梅尔频率倒谱系数的音频特征与OpenCV库中的Haar级联检测提取的视觉信息相结合,以提高系统识别率。实验结果表明,在低信噪比条件下,所提系统的正确识别率保持在89%左右。Audio-visual Speech Recognition(AVSR) systems combine audio and visual information to provide reliable speech recognition. In order to improve the recognition accuracy of AVSR system in low Signal-to-Noise Ratio(SNR) environment, a Recurrent Neural Network(RNN) based AVSR system is proposed in this paper. The system consists of three parts: audio feature extraction module, visual feature extraction module and audio and visual feature combination module. The feature association module uses RNN to combine the audio features based on the Mel frequency cepstrum coefficient with the visual information extracted from the Haar cascade detection in OpenCV library to improve the recognition rate of the system. The experimental results show that the correct recognition rate of the proposed system is about 89% under the condition of low SNR.
关 键 词:视觉语音识别 循环神经网络(RNN) 梅尔频率倒谱系数(MFCC) 信噪比(SNR)
分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.137.222.228