检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Lirong Yin Lei Wang Siyu Lu Ruiyang Wang Youshuai Yang Bo Yang Shan Liu Ahmed AlSanad Salman A.AlQahtani Zhengtong Yin Xiaolu Li Xiaobing Chen Wenfeng Zheng
机构地区:[1]Department of Geography and Anthropology,Louisiana State University,Baton Rouge,LA,70803,USA [2]School of Automation,University of Electronic Science and Technology of China,Chengdu,610054,China [3]College of Computer and Information Sciences,King Saud University,Riyadh,11574,Saudi Arabia [4]College of Resources and Environmental Engineering,Key Laboratory of Karst Georesources and Environment(Guizhou University),Ministry of Education,Guiyang,550025,China [5]School of Geographical Sciences,Southwest University,Chongqing,400715,China [6]School of Electrical and Computer Engineering,Louisiana State University,Baton Rouge,LA,70803,USA
出 处:《Computer Modeling in Engineering & Sciences》2024年第10期87-106,共20页工程与科学中的计算机建模(英文)
基 金:Support by Sichuan Science and Technology Program(2021YFQ0003,2023YFSY 0026,2023YFH0004).
摘 要:This study addresses the limitations of Transformer models in image feature extraction,particularly their lack of inductive bias for visual structures.Compared to Convolutional Neural Networks(CNNs),the Transformers are more sensitive to different hyperparameters of optimizers,which leads to a lack of stability and slow convergence.To tackle these challenges,we propose the Convolution-based Efficient Transformer Image Feature Extraction Network(CEFormer)as an enhancement of the Transformer architecture.Our model incorporates E-Attention,depthwise separable convolution,and dilated convolution to introduce crucial inductive biases,such as translation invariance,locality,and scale invariance,into the Transformer framework.Additionally,we implement a lightweight convolution module to process the input images,resulting in faster convergence and improved stability.This results in an efficient convolution combined Transformer image feature extraction network.Experimental results on the ImageNet1k Top-1 dataset demonstrate that the proposed network achieves better accuracy while maintaining high computational speed.It achieves up to 85.0%accuracy across various model sizes on image classification,outperforming various baseline models.When integrated into the Mask Region-ConvolutionalNeuralNetwork(R-CNN)framework as a backbone network,CEFormer outperforms other models and achieves the highest mean Average Precision(mAP)scores.This research presents a significant advancement in Transformer-based image feature extraction,balancing performance and computational efficiency.
关 键 词:TRANSFORMER E-Attention depth convolution dilated convolution CEFormer
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49