Partition-Time Masking:一种唇语识别数据增强方法

Partition-Time Masking:A Data Augmentation Method for Lip Reading

作　　者：胡宇[1] 殷继彬[1] HU Yu;YIN Jibin(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)

机构地区：[1]昆明理工大学信息工程与自化学院,昆明650500

出　　处：《计算机科学》2024年第S02期473-478,共6页Computer Science

摘　　要：提出了一种唇语识别数据增强方法Partition-Time Masking。该方法直接作用于输入数据,通过将输入划分为多个子序列再分别进行Mask操作最后再将各子序列按序拼接,使得模型能对部分帧缺失的输入具有更强的鲁棒性,从而增强泛化能力。实验前根据划分的子序列数目与掩码值来源不同而设计了5种增强策略,并与唇语识别研究中最重要的数据增强方法Time Masking进行了对比实验。实验在LRW数据集和LRW1000数据集上进行,实验结果表明Partition-Time Masking方法对模型性能提升的效果要优于Time Masking方法,其中子序列数目为3、掩码值选择各子序列平均帧时为最优策略,该策略使得目前最佳的唇语识别模型DC-TCN的性能从89.6%提高到90.0%。This paper proposes a new data augmentation method for lip-reading called Partition-Time Masking.This method operates directly on the input data,dividing it into multiple subsequences,each undergoing a separate masking operation before being sequentially reassembled.This approach enhances the model’s robustness to inputs with partial frame loss,thereby improving generalization.Five augmentation strategies are designed based on the number of divided subsequences and the source of the mask values.Comparative experiments are also conducted with the Time Masking method,a pivotal data augmentation technique in lip-reading research.Experiments are carried out on the LRW and LRW1000 datasets.The results indicate that the Partition-Time Masking method surpasses the Time Masking method in enhancing model performance.The optimal strategy is identified as using an average frame of each subsequence for masking,with the number of subsequences set to three.This approach improves the performance of the state-of-the-art lip-reading model DC-TCN from 89.6% to 90.0%.

关键词：唇语识别 Time Making 数据增强视觉语音识别 DC-TCN

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Partition-Time Masking:一种唇语识别数据增强方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Partition-Time Masking:一种唇语识别数据增强方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索