基于BiLSTM-Attention唇语识别的研究被引量：2

Research on Lip-reading Based on BiLSTM-Attention

作　　者：刘大运房国志[2] 骆天依魏华杰王倩李修政李骜[1] LIU Da-yun;FANG Guo-zhi;LUO Tian-yi;WEI Hua-jie;Wang Qian;Li Xiu-zheng;Li Ao(School of Computer Science and Technology,Harbin University of Scienceand Technology,Harbin,Heilongjiang 150080,China;School of Measurement and Control Technology and Communication Engineering,Harbin University of Science and Technology,Harbin,Heilongjiang 150080,China;School of Automation,Harbin University of Science and Technology,Harbin,Heilongjiang 150080,China)

机构地区：[1]哈尔滨理工大学计算机科学与技术学院,黑龙江哈尔滨150080 [2]哈尔滨理工大学测控技术与通信工程学院,黑龙江哈尔滨150080 [3]哈尔滨理工大学自动化学院,黑龙江哈尔滨150080

出　　处：《计算技术与自动化》2020年第1期150-155,共6页Computing Technology and Automation

基　　金：国家自然科学基金资助项目(61501147);黑龙江省大学生创新创业资助项目(20180214007)。

摘　　要：为了解决唇语识别中唇部特征提取和时序关系识别存在的问题,提出了一种双向长短时记忆网络(BiLSTM)和注意力机制(Attention Mechanism)相结合的深度学习模型。首先将唇部20个关键点得到的唇部不同位置的高度和宽度作为唇部的特征,使用BiLSTM对唇部特征序列进行时序编码,然后利用注意力机制来发掘不同时刻唇部时序特征对于整体唇语识别的不同权重,最后利用Softmax进行分类。在公开的唇语识别数据集GRID和MIRACL-VC上与传统的唇语识别模型进行实验对比。在GRID数据集上准确率至少提高了13.4%,在MIRACL-VC单词数据集上准确率至少提高了15.3%,短语数据集上准确率至少提高了9.2%。同时还与其他编码模型进行了实验对比,实验结果表明该模型能有效地提高唇语识别的准确率。In order to solve the existing problems in lip feature extraction and temporal relation recognition during the research of lip-reading,a deep learning model based on bi-directional long short-term memory(BiLSTM)and attention mechanism(Attention)is proposed.Firstly,the height and width of the different positions of the lip obtained from the 20 key points of the lip are taken as the characteristics of the lip.Secondly,the BiLSTM model is utilized to encode temporal information.Thirdly,the attention mechanism is used to explore different weights of lip sequential features at different times toward the overall lip language recognition.Finally,we use Softmax classifier to classify.Compared with the conventional lip-learning models at the current lip language recognition database GRID and MIRACL-VC,we find the recognition accuracy rate is more than 13.4%higher than that on GRID.In the MIRACL-VC word database,the accuracy rate increased by at least 15.3%,and the accuracy rate in the phrase database increased by at least 9.2%.At the same time,compared with other coding models,the experimental results show that this model can effectively improve the accuracy of lip-reading.

关键词：唇语识别双向长短时记忆网络注意力机制深度学习时序编码

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于BiLSTM-Attention唇语识别的研究被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于BiLSTM-Attention唇语识别的研究 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于BiLSTM-Attention唇语识别的研究被引量：2