基于循环结构的视觉Transformer 被引量：1

Visual Transformer based on a recurrent structure

作　　者：蒋磊王子其崔振宇常志勇[3,4] 时小虎 JIANG Lei;WANG Zi-qi;CUI Zhen-yu;CHANG Zhi-yong;SHI Xiao-hu(Faculty of Mechanics and Mathematics,Moscow State University,Moscow 119991,Russia;Faculty of Computational Mathematics and Cybernetics,Moscow State University,Moscow 119991,Russia;College of Biological and Agricultural Engineering,Jilin University,Changchun 130022,China;Key Laboratory of Bionic Engineering,Ministry of Education,Jilin University,Changchun 130022,China;College of Computer Science and Technology,Jilin University,Changchun 130012,China;Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education,Jilin University,Changchun 130012,China)

机构地区：[1]莫斯科国立大学数学力学系,莫斯科119991 [2]莫斯科国立大学计算数学与控制理论系,莫斯科119991 [3]吉林大学生物与农业工程学院,长春130022 [4]吉林大学工程仿生教育部重点实验室,长春130022 [5]吉林大学计算机科学与技术学院,长春130012 [6]吉林大学符号计算与知识工程教育部重点实验室,长春130012

出　　处：《吉林大学学报（工学版）》2024年第7期2049-2056,共8页Journal of Jilin University:Engineering and Technology Edition

基　　金：国家自然科学基金项目(62272192);吉林省科技发展计划项目(20210201080GX);吉林省发改委项目(2021C044-1).

摘　　要：针对视觉Transformer(Vision Transformer,ViT)性能的提升依赖于网络的参数量,从而导致其应用场景受限的缺点,本文从神经学得到启发,创新性地提出将人脑神经元之间的循环结构应用在ViT上。文中首次从黎曼几何的角度解释了循环结构生效的工作原理,之后以Token-to-Token Transformer(T2T Transformer)为主干框架提出了基于循环结构的ViT。实验结果表明:循环结构的引入能在视觉Transformer参数量基本不变化的情况下大幅提高其性能,使用循环结构后,在Imagenet分类数据集下网络仅增加0.14%的参数,但带来9%的分类精度提升;在目标检测任务中,增加0.1%的参数带来10.7%的性能提升。In recent years,Vision Transformer(ViT)has shown amazing potential in areas such as image classification,target detection,and image generation.However,the performance improvement of ViT relies on the number of parameters in the network,leading to limited application scenarios.Inspired by neurology,this paper innovatively proposes to apply the recurrent structure between neurons in the human brain to ViT.This paper explains for the first time how the recurrent structure works from the perspective of Riemannian geometry,then presents the recurrent Vision Transformer model based on Token-to-Token Transformer architecture.Experimental results show that the introduction of the recurrent structure can substantially improve the performance of ViT with essentially no change in the number of parameters:we apply the recurrent structure in the Transformer networks,and on the Imagenet classification dataset,the network increase the parameters by only 0.14%,but bring a 9%improvement in classification accuracy.In the target detection task,increasing the parameters by 0.1%brings a 10.7%performance improvement.

关键词：视觉Transformer 循环结构黎曼几何

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于循环结构的视觉Transformer 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于循环结构的视觉Transformer 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索