检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘大运 房国志[2] 骆天依 魏华杰 王倩 李修政 李骜[1] LIU Da-yun;FANG Guo-zhi;LUO Tian-yi;WEI Hua-jie;Wang Qian;Li Xiu-zheng;Li Ao(School of Computer Science and Technology,Harbin University of Scienceand Technology,Harbin,Heilongjiang 150080,China;School of Measurement and Control Technology and Communication Engineering,Harbin University of Science and Technology,Harbin,Heilongjiang 150080,China;School of Automation,Harbin University of Science and Technology,Harbin,Heilongjiang 150080,China)
机构地区:[1]哈尔滨理工大学计算机科学与技术学院,黑龙江哈尔滨150080 [2]哈尔滨理工大学测控技术与通信工程学院,黑龙江哈尔滨150080 [3]哈尔滨理工大学自动化学院,黑龙江哈尔滨150080
出 处:《计算技术与自动化》2020年第1期150-155,共6页Computing Technology and Automation
基 金:国家自然科学基金资助项目(61501147);黑龙江省大学生创新创业资助项目(20180214007)。
摘 要:为了解决唇语识别中唇部特征提取和时序关系识别存在的问题,提出了一种双向长短时记忆网络(BiLSTM)和注意力机制(Attention Mechanism)相结合的深度学习模型。首先将唇部20个关键点得到的唇部不同位置的高度和宽度作为唇部的特征,使用BiLSTM对唇部特征序列进行时序编码,然后利用注意力机制来发掘不同时刻唇部时序特征对于整体唇语识别的不同权重,最后利用Softmax进行分类。在公开的唇语识别数据集GRID和MIRACL-VC上与传统的唇语识别模型进行实验对比。在GRID数据集上准确率至少提高了13.4%,在MIRACL-VC单词数据集上准确率至少提高了15.3%,短语数据集上准确率至少提高了9.2%。同时还与其他编码模型进行了实验对比,实验结果表明该模型能有效地提高唇语识别的准确率。In order to solve the existing problems in lip feature extraction and temporal relation recognition during the research of lip-reading,a deep learning model based on bi-directional long short-term memory(BiLSTM)and attention mechanism(Attention)is proposed.Firstly,the height and width of the different positions of the lip obtained from the 20 key points of the lip are taken as the characteristics of the lip.Secondly,the BiLSTM model is utilized to encode temporal information.Thirdly,the attention mechanism is used to explore different weights of lip sequential features at different times toward the overall lip language recognition.Finally,we use Softmax classifier to classify.Compared with the conventional lip-learning models at the current lip language recognition database GRID and MIRACL-VC,we find the recognition accuracy rate is more than 13.4%higher than that on GRID.In the MIRACL-VC word database,the accuracy rate increased by at least 15.3%,and the accuracy rate in the phrase database increased by at least 9.2%.At the same time,compared with other coding models,the experimental results show that this model can effectively improve the accuracy of lip-reading.
关 键 词:唇语识别 双向长短时记忆网络 注意力机制 深度学习 时序编码
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.135.184.166