轻量化汉语唇读模型及数据集构建  

Lightweight Chinese Lipreading Model and Dataset

在线阅读下载全文

作  者:孙保胜 谢东亮 SUN Baosheng;XIE Dongliang(School of Computer Science,Beijing University of Posts and Telecommunications,Beijing 100876,China)

机构地区:[1]北京邮电大学计算机学院,北京100876

出  处:《北京邮电大学学报》2023年第4期58-63,共6页Journal of Beijing University of Posts and Telecommunications

摘  要:为了促进汉语唇读的快速发展和实际应用,提出了一种基于交错组卷积和空洞卷积组合的轻量化唇读模型。所提模型通过分组卷积学习不同特征间的相关性,通过空洞卷积扩展模型视野,在大幅度降低模型参数量和复杂度的同时提高模型识别精度。针对汉语唇读数据集较少的问题,在可控制环境下录制了一个句子级汉语唇读数据集。在录制数据集和公开数据集上对轻量化唇读模型适用性进行实验验证,证明了模型的有效性。并通过热图可视化的方法分析了模型对视频帧和文本映射关系的学习能力。In order to promote the rapid development and practical application of Chinese lipreading,a lightweight lipreading model is proposed based on the combination of interleaved group convolution and dilated convolution.In the proposed model,the interleaved group convolution is taken to learn the correlation between different features and the dilated convolution is taken to expand the model receptive field,which greatly reduces the amount and complexity of model parameter and improves the accuracy of model recognition.In addition,the largest sentence-level Chinese lipreading dataset is recorded in a controlled environment to enrich the Chinese lipreading dataset.The applicability of the lightweight lipreading model is verified on the recorded datasets and public datasets.The learning ability of the model to the video frame and text mapping relationship is analyzed visually through the heatmap.

关 键 词:汉语唇读 轻量化 交错组卷积 空洞卷积 

分 类 号:TN911.73[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象