检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘诗瑶 张忠民[1] LIU Shiyao;ZHANG Zhongmin(College of Information and Communication Engineering,Harbin Engineering University,Harbin 150001,China)
机构地区:[1]哈尔滨工程大学信息与通信工程学院,黑龙江哈尔滨150001
出 处:《应用科技》2023年第3期44-49,共6页Applied Science and Technology
摘 要:连续手语识别是将视频序列转为符号序列的典型的弱监督问题,它只提供了句子级标签,没有提供有时间边界的帧级标签。为了克服这一问题,就需要使用连接时态分类(connectionist temporal classification,CTC),这也是目前应用最广泛的方法。实验在特征提取阶段引入时间-空间注意力机制,并增加辅助对齐模块,利用聚合交叉熵(agregation cross-entropy,ACE)散度损失对特征提取部分进行训练。该模型采用端到端方式,结合ACE散度进行片段级特征学习和CTC进行全局序列特征学习的联合训练,此外,提出的随机掉帧机制还可以进一步缓解过拟合问题。在中文手语数据集CSLR上验证了改进方法的有效性,在验证集上获得了6.9%错误率,在测试集上获得了4.3%错误率。Continuous sign language recognition is a typical weakly supervised problem in converting video sequences into symbolic sequences.It only provides sentence level tags,but does not provide frame level tags with time boundaries.To overcome this problem,connectionist temporal classification(CTC)is the most widely used method at present.Convolutional block attention module(CBAM)was introduced in the feature extraction stage,and auxiliary alignment module was added,and the feature extraction part was trained by agregation cross-entropy(ACE)loss.The model adopts end-to-end mode,combining ACE divergence for segmental-level feature learning and CTC for global sequence feature learning.In addition,in order to alleviate the serious over-fitting problem,a random frame dropping mechanism is further proposed.The effectiveness of the improved method was verified on the Chinese sign language dataset CSLR,and 6.9%WER was obtained on the validation set and 4.3%WER on the test set.
关 键 词:连续手语识别 注意力机制 时间-空间注意力机制 连接时态分类 联合训练 弱监督 跨模态 深度学习
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.63