检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:蒋磊 王子其 崔振宇 常志勇[3,4] 时小虎 JIANG Lei;WANG Zi-qi;CUI Zhen-yu;CHANG Zhi-yong;SHI Xiao-hu(Faculty of Mechanics and Mathematics,Moscow State University,Moscow 119991,Russia;Faculty of Computational Mathematics and Cybernetics,Moscow State University,Moscow 119991,Russia;College of Biological and Agricultural Engineering,Jilin University,Changchun 130022,China;Key Laboratory of Bionic Engineering,Ministry of Education,Jilin University,Changchun 130022,China;College of Computer Science and Technology,Jilin University,Changchun 130012,China;Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education,Jilin University,Changchun 130012,China)
机构地区:[1]莫斯科国立大学数学力学系,莫斯科119991 [2]莫斯科国立大学计算数学与控制理论系,莫斯科119991 [3]吉林大学生物与农业工程学院,长春130022 [4]吉林大学工程仿生教育部重点实验室,长春130022 [5]吉林大学计算机科学与技术学院,长春130012 [6]吉林大学符号计算与知识工程教育部重点实验室,长春130012
出 处:《吉林大学学报(工学版)》2024年第7期2049-2056,共8页Journal of Jilin University:Engineering and Technology Edition
基 金:国家自然科学基金项目(62272192);吉林省科技发展计划项目(20210201080GX);吉林省发改委项目(2021C044-1).
摘 要:针对视觉Transformer(Vision Transformer,ViT)性能的提升依赖于网络的参数量,从而导致其应用场景受限的缺点,本文从神经学得到启发,创新性地提出将人脑神经元之间的循环结构应用在ViT上。文中首次从黎曼几何的角度解释了循环结构生效的工作原理,之后以Token-to-Token Transformer(T2T Transformer)为主干框架提出了基于循环结构的ViT。实验结果表明:循环结构的引入能在视觉Transformer参数量基本不变化的情况下大幅提高其性能,使用循环结构后,在Imagenet分类数据集下网络仅增加0.14%的参数,但带来9%的分类精度提升;在目标检测任务中,增加0.1%的参数带来10.7%的性能提升。In recent years,Vision Transformer(ViT)has shown amazing potential in areas such as image classification,target detection,and image generation.However,the performance improvement of ViT relies on the number of parameters in the network,leading to limited application scenarios.Inspired by neurology,this paper innovatively proposes to apply the recurrent structure between neurons in the human brain to ViT.This paper explains for the first time how the recurrent structure works from the perspective of Riemannian geometry,then presents the recurrent Vision Transformer model based on Token-to-Token Transformer architecture.Experimental results show that the introduction of the recurrent structure can substantially improve the performance of ViT with essentially no change in the number of parameters:we apply the recurrent structure in the Transformer networks,and on the Imagenet classification dataset,the network increase the parameters by only 0.14%,but bring a 9%improvement in classification accuracy.In the target detection task,increasing the parameters by 0.1%brings a 10.7%performance improvement.
关 键 词:视觉Transformer 循环结构 黎曼几何
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49