基于多模态融合大模型架构Agri-QA Net的作物知识问答系统  

Agri-QA Net:Multimodal Fusion Large Language Model Architecture for Crop Knowledge Question-Answering System

在线阅读下载全文

作  者:吴华瑞[1] 赵春江[1] 李静晨 WU Huarui;ZHAO Chunjiang;LI Jingchen(Information Technology Research Center,Beijing Academy of Agriculture and Forestry Sciences,Beijing 100079,China)

机构地区:[1]北京市农林科学院信息技术研究中心,北京100079

出  处:《智慧农业(中英文)》2025年第1期1-10,共10页Smart Agriculture

基  金:国家重点研发计划(2021ZD0113604);科技创新2030重大项目(2022ZD0115705-05)。

摘  要:[目的/意义]随着农业信息化和智能化的快速发展,多模态人机交互技术在农业领域的重要性日益凸显。本研究提出了一种基于多模态融合的大模型架构Agri-QA Net,旨在针对甘蓝作物的农业知识,设计多模态专业问答系统。[方法]该模型通过整合文本、音频和图片数据,利用预训练的BERT(Bidirectional Encoder Representations from Transformers)模型提取文本特征,声学模型提取音频特征,以及卷积神经网络提取图像特征,并采用基于Transformer的融合层来整合这些特征。此外,引入跨模态注意力机制和领域自适应技术,增强了模型对农业领域专业知识的理解和应用能力。本研究通过收集和预处理甘蓝种植相关的多模态数据,训练并优化了AgriQA Net模型。[结果和讨论]实验评估表明,该模型在甘蓝农业知识问答任务上表现出色,相较于传统的单模态或简单多模态模型,具有更高的准确率和更好的泛化能力。在多模态输入的支持下,其准确率达到了89.5%,精确率为87.9%,召回率为91.3%,F_(1)值为89.6%,均显著高于单一模态模型。[结论]案例研究展示了Agri-QA Net在实际农业场景中的应用效果,证明了其在帮助农民解决实际问题中的有效性。未来的工作将探索模型在更多农业场景中的应用,并进一步优化模型性能。[Objective]As agriculture increasingly relies on technological innovations to boost productivity and ensure sustainability,farmers need efficient and accurate tools to aid their decision-making processes.A key challenge in this context is the retrieval of specialized agricultural knowledge,which can be complex and diverse in nature.Traditional agricultural knowledge retrieval systems have often been limited by the modalities they utilize(e.g.,text or images alone),which restricts their effectiveness in addressing the wide range of queries farmers face.To address this challenge,a specialized multimodal question-answering system tailored for cabbage cultivation was proposed.The system,named Agri-QA Net,integrates multimodal data to enhance the accuracy and applicability of agricultural knowledge retrieval.By incorporating diverse data modalities,Agri-QA Net aims to provide a holistic approach to agricultural knowledge retrieval,enabling farmers to interact with the system using multiple types of input,ranging from spoken queries to images of crop conditions.By doing so,it helps address the complexity of real-world agricultural environments and improves the accessibility of relevant information.[Methods]The architecture of Agri-QA Net was built upon the integration of multiple data modalities,including textual,auditory,and visual data.This multifaceted approach enables the system to develop a comprehensive understanding of agricultural knowledge,al‐lowed the system to learn from a wide array of sources,enhancing its robustness and generalizability.The system incorporated state of-the-art deep learning models,each designed to handle one specific type of data.Bidirectional Encoder Representations from Trans‐formers(BERT)'s bidirectional attention mechanism allowed the model to understand the context of each word in a given sentence,significantly improving its ability to comprehend complex agricultural terminology and specialized concepts.The system also incorpo‐rated acoustic models for processing audio inputs.T

关 键 词:多模态融合 人机交互 农业知识问答 甘蓝作物 大语言模型 

分 类 号:S24[农业科学—农业电气化与自动化]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象