基于位置编码重叠切块嵌入和多尺度通道交互注意力的鱼类图像分类被引量：1

Fish image classification based on positional overlapping patch embedding and multi-scale channel interactive attention

作　　者：周雯谌雨章[1] 温志远王诗琦[1] ZHOU Wen;CHEN Yuzhang;WEN Zhiyuan;WANG Shiqi(School of Artificial Intelligence,Hubei University,Wuhan Hubei 430062,China;School of Computer Science and Information Engineering,Hubei University,Wuhan Hubei 430062,China)

机构地区：[1]湖北大学人工智能学院,武汉430062 [2]湖北大学计算机与信息工程学院,武汉430062

出　　处：《计算机应用》2024年第10期3209-3216,共8页journal of Computer Applications

基　　金：教育部产学合作协同育人项目(202101142041)。

摘　　要：水下鱼类图像分类是一项极具挑战性的任务。传统Vision Transformer(ViT)网络骨干的局限性较大,难以处理局部连续特征,在图像质量较低的鱼类分类中效果表现不佳。为解决此问题,提出一种基于位置编码的重叠切块嵌入(OPE)和多尺度通道交互注意力(MCIA)的Transformer图像分类网络PIFormer(Positional overlapping and Interactive attention transFormer)。PIFormer采用多层级形式构建,每层以不同次数堆叠,利于提取不同深度的特征。首先,引入深度位置编码重叠切块嵌入(POPE)模块对特征图与边缘信息进行重叠切块,以保留鱼体的局部连续特征,并添加位置信息以排序,帮助PIFormer整合细节特征和构建全局映射;其次,提出MCIA模块并行处理局部与全局特征,并建立鱼体不同部位的长距离依赖关系;最后,由分组多层感知机(GMLP)分组处理高层次特征,以提升网络效率,并实现最终的鱼类分类。为验证PIFormer的有效性,提出自建东湖淡水鱼类数据集,并使用公共数据集Fish4Knowledge与NCFM(Nature Conservancy Fisheries Monitoring)以确保实验公平性。实验结果表明,所提网络在各数据集上的Top-1分类准确率分别达到了97.99%、99.71%和90.45%,与同级深度的ViT、Swin Transformer和PVT(Pyramid Vision Transformer)相比,参数量分别减少了72.62×10^(6)、14.34×10^(6)和11.30×10^(6),浮点运算量(FLOPs)分别节省了14.52×10^(9)、2.02×10^(9)和1.48×10^(9)。可见,PIFormer在较少的计算负荷下,具有较强的鱼类图像分类能力,取得了优越的性能。Underwater fish image classification is a highly challenging task.The traditional Vision Transformer(ViT)network backbone is limited to process local continuous features,and it does not perform well in fish classification with lower image quality.To solve this problem,a Transformer-based image classification network based on Overlapping Patch Embedding(OPE)and Multi-scale Channel Interactive Attention(MCIA),called PIFormer(Positional overlapping and Interactive attention transFormer),was proposed.PIFormer was built in a multi-layer format with each layer stacked at different times to facilitate the extraction of features at different depths.Firstly,the deep Positional Overlapping Patch Embedding(POPE)module was introduced to overlap and slice the feature map and edge information,so as to retain the local continuous features of the fish body.At the same time,position information was added for sorting,thereby helping PIFormer integrate the detailed features and build the global map.Then,the MCIA module was proposed to process the local and global features in parallel,and establish the long-distance dependencies of different parts of the fish body.Finally,the high-level features were processed by Group Multi-Layer Perceptron(GMLP)to improve the efficiency of the network and realize the final fish classification.To verify the effectiveness of PIFormer,a self-built dataset of freshwater fishes in East Lake was proposed,and the public datasets Fish4Knowledge and NCFM(Nature Conservancy Fisheries Monitoring)were used to ensure experimental fairness.Experimental results demonstrate that the Top-1 classification accuracy of the proposed network on each dataset reaches 97.99%,99.71%and 90.45%respectively.Compared with ViT,Swin Transformer and PVT(Pyramid Vision Transformer)of the same depth,the proposed network has the number of parameters reduced by 72.62×10^(6),14.34×10^(6) and 11.30×10^(6) respectively,and the FLoating point Operation Per second(FLOPs)saved by 14.52×10^(9),2.02×10^(9) and 1.48×10^(9) respectively.

关键词：鱼类图像分类位置编码重叠切块嵌入通道交互注意力 Vision Transformer

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于位置编码重叠切块嵌入和多尺度通道交互注意力的鱼类图像分类被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于位置编码重叠切块嵌入和多尺度通道交互注意力的鱼类图像分类 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于位置编码重叠切块嵌入和多尺度通道交互注意力的鱼类图像分类被引量：1