基于参数高效微调的跨模态枸杞虫害识别模型D-PAG

D-PAG:Cross-modal Wolfberry Pest Recognition Model Based on Parameter-Efficient Fine-Tuning

作　　者：邢嘉璐刘建平周国民刘立波[5] 王健 XING JiaLu;LIU JianPing;ZHOU GuoMin;LIU LiBo;WANG Jian(College of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China;The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission,Yinchuan 750021,China;Nanjing Institute of Agricultural Mechanization,Ministry of Agriculture and Rural Affairs,Nanjing 210014,China;National Agriculture Science Data Center,Agricultural Information Institute,Chinese Academy of Agricultural Sciences,Beijing 100081,China;School of Information Engineering Ningxia University,Yinchuan 750021,China;Agricultural Information institute of CAAS,Beijing 100081,China)

机构地区：[1]北方民族大学计算机科学与工程学院,银川750021 [2]图像图形智能处理国家民委重点实验室,银川750021 [3]农业农村部南京农业机械化研究所,南京210014 [4]中国农业科学院农业信息研究所国家农业科学数据中心,北京100081 [5]宁夏大学信息工程学院,银川750021 [6]中国农业科学院农业信息研究所,北京100081

出　　处：《农业大数据学报》2024年第4期509-521,共13页Journal of Agricultural Big Data

基　　金：国家自然科学基金项目(32460444);北方民族大学重点科研项目(2023ZRLG12);北方民族大学研究生创新项目(YCX24126)。

摘　　要：随着多模态基础模型(大模型)的发展,如何高效地将其迁移到特定领域或任务中成为目前的热点、难点问题。该研究以多模态大模型CLIP为基础模型,使用参数高效微调方法Prompt、Adapter将CLIP迁移到枸杞虫害识别任务中,提出了用于枸杞虫害识别的跨模态参数高效微调模型D-PAG。D-PAG模型首先在CLIP编码器的输入层或隐层中嵌入了可学习的Prompt与Adapter,用于训练,学习虫害特征;然后利用门控单元将Prompt、Adapter集成到CLIP编码器网络中,平衡两者对特征提取的影响大小,在Adapter中设计了GCS-Adapter注意力用以加强跨模态语义信息融合。为了验证方法的有效性,在枸杞虫害数据集和细粒度数据集IP102上进行了实验。验证实验结果表明,在枸杞数据集上仅用20%样本数量训练便可达到98.8%的准确率,使用40%样本数量训练准确率达到了99.5%;在IP102上验证,准确率达到75.6%,与ViT持平。该方案可在少样本条件下,通过引入极少额外参数,将多模态大模型基础知识高效迁移到特定虫害识别领域,为高效使用大模型解决农业图像处理问题提供了新的技术方案。With the development of multimodal foundation models(large models),efficiently transferring them to specific domains or tasks has become a current hot topic.This study uses the multimodal large model CLIP as the base model and employs parameter-efficient fine-tuning methods,such as Prompt and Adapter,to adapt CLIP to the task of goji berry pest identification.It introduces a cross-modal parameter-efficient fine-tuning model for goji berry pest recognition,named D-PAG.Firstly,learnable Prompts and Adapters are embedded in the input or hidden layers of the CLIP encoder to capture pest features.Then,gated units are utilized to integrate the Prompt and Adapter,further balancing the learning capacity.A GCS-Adapter is designed within the Adapter to enhance the attention mechanism for cross-modal semantic information fusion.To validate the effectiveness of the method,experiments were conducted on the goji berry pest dataset and the fine-grained dataset IP102.The experimental results indicate that with only 20%of the sample size,an accuracy of 98.8%was achieved on the goji dataset,and an accuracy of 99.5%was reached with 40%of the samples.On IP102,an accuracy of 75.6%was attained,comparable to ViT.This approach allows for efficient transfer of the foundational knowledge of multimodal large models to the specific domain of pest recognition with minimal additional parameters,providing a new technical solution for efficiently addressing agricultural image processing problems.

关键词：枸杞虫害识别参数高效微调大模型 CLIP

分类号：TP3[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于参数高效微调的跨模态枸杞虫害识别模型D-PAG

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于参数高效微调的跨模态枸杞虫害识别模型D-PAG

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索