基于视频的中文连续手语识别算法  被引量:1

Video based Chinese continuous sign language recognition algorithm

在线阅读下载全文

作  者:刘诗瑶 张忠民[1] LIU Shiyao;ZHANG Zhongmin(College of Information and Communication Engineering,Harbin Engineering University,Harbin 150001,China)

机构地区:[1]哈尔滨工程大学信息与通信工程学院,黑龙江哈尔滨150001

出  处:《应用科技》2023年第3期44-49,共6页Applied Science and Technology

摘  要:连续手语识别是将视频序列转为符号序列的典型的弱监督问题,它只提供了句子级标签,没有提供有时间边界的帧级标签。为了克服这一问题,就需要使用连接时态分类(connectionist temporal classification,CTC),这也是目前应用最广泛的方法。实验在特征提取阶段引入时间-空间注意力机制,并增加辅助对齐模块,利用聚合交叉熵(agregation cross-entropy,ACE)散度损失对特征提取部分进行训练。该模型采用端到端方式,结合ACE散度进行片段级特征学习和CTC进行全局序列特征学习的联合训练,此外,提出的随机掉帧机制还可以进一步缓解过拟合问题。在中文手语数据集CSLR上验证了改进方法的有效性,在验证集上获得了6.9%错误率,在测试集上获得了4.3%错误率。Continuous sign language recognition is a typical weakly supervised problem in converting video sequences into symbolic sequences.It only provides sentence level tags,but does not provide frame level tags with time boundaries.To overcome this problem,connectionist temporal classification(CTC)is the most widely used method at present.Convolutional block attention module(CBAM)was introduced in the feature extraction stage,and auxiliary alignment module was added,and the feature extraction part was trained by agregation cross-entropy(ACE)loss.The model adopts end-to-end mode,combining ACE divergence for segmental-level feature learning and CTC for global sequence feature learning.In addition,in order to alleviate the serious over-fitting problem,a random frame dropping mechanism is further proposed.The effectiveness of the improved method was verified on the Chinese sign language dataset CSLR,and 6.9%WER was obtained on the validation set and 4.3%WER on the test set.

关 键 词:连续手语识别 注意力机制 时间-空间注意力机制 连接时态分类 联合训练 弱监督 跨模态 深度学习 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象