基于多模态预训练模型的水稻病虫害图像描述生成研究  被引量:2

Research on image description generation of rice diseases and pests based on multimodal pre-training model

在线阅读下载全文

作  者:薛悦平 胡彦蓉[1] 刘洪久[1] 童莉珍[1] 葛万钊 XUE Yueping;HU Yanrong;LIU Hongjiu;TONG Lizhen;GE Wanzhao(College of Mathematics and Computer Science/Zhejiang Key Laboratory of Forestry Intelligence Monitoring and Information Technology Research/Key Laboratory of Forestry Sensing Technology and Intelligent Equipment,National Forestry and Grassland Administration,Zhejiang A&F University,Hangzhou 311300,China)

机构地区:[1]浙江农林大学数学与计算机科学学院/浙江省林业智能监测与信息技术研究重点实验室/林业感知技术与智能装备国家林业和草原局重点实验室,浙江杭州311300

出  处:《南京农业大学学报》2024年第4期782-791,共10页Journal of Nanjing Agricultural University

基  金:教育部人文社会科学研究规划基金项目(18YJA630037,21YJA630054);浙江省自然科学基金资助项目(LY18G010005)。

摘  要:[目的]针对水稻病虫害图像分类技术缺少对病症描述的问题,本文提出一种轻量化的水稻病虫害图像描述模型,对水稻病虫害图像进行更为具体的描述。[方法]以白叶枯病、细菌性条斑病、恶苗病、三化螟虫、稻瘟病、稻曲病、纹枯病、飞虱、稻蓟马、胡麻斑病这十类常见的水稻病虫害开展研究,构建了水稻病虫害图像中文描述数据集。首先采用多模态预训练模型CLIP生成图像向量,其中包含基本的图像信息以及丰富的语义信息,采用映射网络将图像向量映射到文本空间里生成文本提示向量,语言模型GPT-2根据文本提示向量生成图像描述。[结果]在水稻病虫害图像描述数据集上,本文模型的指标总体明显优于其他模型,本文算法的BLEU-1、BLEU-2、BLEU-3、BLEU-4、ROUGE、METEOR指标较传统的CNN_LSTM模型分别提升0.26、0.27、0.24、0.22、0.22、0.14。生成的图像描述具有准确、详细、语义丰富等优点。另外使用实际稻田图片对模型进行测试,实际田间的场景更为复杂多样,生成的图像描述指标与数据集指标对比总体仅有轻微下降,仍高于其他对比模型。本文模型对水稻病虫害的总体识别准确率达97.28%。[结论]基于多模态预训练模型的水稻病虫害图像描述方法能够准确识别水稻病虫害病症并形成相应的病症描述,为水稻病虫害检测提供一种新思路。[Objectives]Aiming at the lack of disease description in rice diseases and pests image classification technology,a lightweight rice diseases and pests image description model was proposed in this paper to describe rice diseases and pests image more specifically.[Methods]Ten common rice pests and diseases,such as rice bacterial blight,rice bacterial streak disease,rice bakanae disease,rice three chemical borers,rice blast,rice false smut,rice sheath blight,rice planthopper,rice thrip and rice brown spot were studied,and Chinese description data set of rice pests and diseases image was constructed.Firstly,the multimodal pre-training model CLIP was used to generate image vectors,which contained basic image information and rich semantic information.The mapping network was used to map the image vectors into the text space to generate text prompt vectors.Finally,the language model GPT-2 generates image descriptions according to the prompt vectors.[Results]The test results showed that the indexes of the model in this paper were significantly superior to other models in the image description data set of rice pests and diseases.Compared with the traditional CNN_LSTM model,the indexes of BLEU-1,BLEU-2,BLEU-3,BLEU-4,ROUGE and METEOR improved 0.26,0.27,0.24,0.22,0.22 and 0.14,respectively.And the generated image description had the advantages of accurate,detailed and rich semantics.The model was tested by using actual rice field pictures.The actual field scenes were more complex and diverse,and the generated image description index only slightly decreased compared with the data set index,which was still higher than other comparison models.The overall recognition accuracy of the model was 97.28%.[Conclusions]The image description method of rice diseases and pests based on multimodal pre-training model can accurately describe the rice diseases and pests,and provide a new idea for the detection of rice diseases and pests.

关 键 词:多模态预训练模型 水稻病虫害 图像描述生成 诊断 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象