融合Longformer的时空分离注意力装配动作识别网络  

A Fusion of Longformer for Spatio-Temporal Separation Attention Assembly Action Recognition Network

在线阅读下载全文

作  者:陈从平 张春生 CHEN Cong-ping;ZHANG Chun-sheng(School of Mechanical Engineering and Rail Transit,Changzhou University,Changzhou 213164)

机构地区:[1]常州大学机械与轨道交通学院,江苏常州213164

出  处:《制造业自动化》2025年第4期112-119,共8页Manufacturing Automation

基  金:江苏省产业前瞻与关键核心技术项目(BE2022044)。

摘  要:为了提高装配质量和装配效率,确保产品质量的一致性,对工人的装配动作进行识别监测。提出一种融合Longformer的时空分离注意力装配动作识别网络,基于时空分离注意力结构分别使用Longformer注意力编码器和Transformer注意力编码器提取视频的外观特征和运动特征,有效整合了长视频序列中的时空信息,同时降低了网络的计算复杂度和网络参数量。在装配动作数据集上的实验结果表明,方法相比基于卷积的SlowFast网络可以更好的提取全局视频特征,Top-1准确率提升了2.44%。相比基于Transformer的TimeSformer网络,Top-1准确率提升了0.45%,参数量降低了65.9%,同时允许输入更长的视频序列,能更有效识别工人的装配动作。To improve the quality and efficiency of assembly and ensure the consistency in product quality,the assembly actions of workers are recognized and monitored.This paper proposes a fusion assembly action recognition network with Longformer's spatiotemporal separated attention.Through the spatiotemporal separated attention structure,Longformer attention encoder and Transformer attention encoder are used separately to extract appearance and motion features of the video,effectively integrating spatiotemporal information in long video sequences while reducing the computational complexity and network parameters.Experimental results on an assembly action dataset show that our approach outperforms the convolution-based SlowFast network in extracting global video features,achieving a 2.44%improvement in Top-1 accuracy.Compared to the Transformer-based TimeSformer network,Top-1 accuracy is improved by 0.45%,and parameters are reduced by 65.9%,while enabling effective recognition of worker assembly actions with longer video sequences.

关 键 词:动作识别 注意力机制 时空分离注意力 Longformer 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象