Developing phoneme-based lip-reading sentences system for silent speech recognition  

在线阅读下载全文

作  者:Randa El-Bialy Daqing Chen Souheil Fenghour Walid Hussein Perry Xiao Omar HKaram Bo Li 

机构地区:[1]School of Engineering,London South Bank University,London,UK [2]Faculty of Informatics and Computer Science,British University in Egypt,Cairo,Egypt [3]School of Electronics and Informatics,Northwestern Polytechnical University,Xi'an,China

出  处:《CAAI Transactions on Intelligence Technology》2023年第1期129-138,共10页智能技术学报(英文)

摘  要:Lip-reading is a process of interpreting speech by visually analysing lip movements.Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild.This paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance system performance.Different classification schemas have been investigated,including characterbased and visemes-based schemas.The visual front-end model of the system consists of a Spatial-Temporal(3D)convolution followed by a 2D ResNet.Transformers utilise multi-headed attention for phoneme recognition models.For the language model,a Recurrent Neural Network is used.The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2(LRS2)benchmark dataset.Compared with the state-of-the-art approaches in lip-reading sentences,the proposed system has demonstrated an improved performance by a 10%lower word error rate on average under varying illumination ratios.

关 键 词:deep learning deep neural networks LIP-READING phoneme-based lip-reading spatial-temporal convolution transformers 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象