基于RGB和深度双模态的温室番茄图像语义分割模型  被引量:4

Semantic segmentation model for greenhouse tomato images using RGB and depth bimodal

在线阅读下载全文

作  者:张羽丰 杨景 邓寒冰[1,2] 周云成 苗腾[1,2] ZHANG Yufeng;YANG Jing;DENG Hanbing;ZHOU Yuncheng;MIAO Teng(College of Information and Electrical Engineering,Shenyang Agricultural University,Shenyang 110866,China;Liaoning Engineering Research Center for Information Technology in Agricultural,Shenyang 110866,China)

机构地区:[1]沈阳农业大学信息与电气工程学院,沈阳110866 [2]辽宁农业信息化工程技术研究中心,沈阳110866

出  处:《农业工程学报》2024年第2期295-306,共12页Transactions of the Chinese Society of Agricultural Engineering

基  金:国家重点研发计划项目子课题(2022YFD2002303-01);辽宁省教育厅基本科研项目面上项目(JYTMS20231303);国家自然科学基金项目(31901399);“十四五”国家重点研发计划项目子课题(2021YFD1500204)。

摘  要:图像语义分割作为计算机视觉领域的重要技术,已经被广泛用于设施环境下的植物表型检测、机器人采摘、设施场景解析等领域。由于温室环境下未成熟番茄果实与其茎叶之间具有相似颜色,会导致图像分割精度不高等问题。该研究提出一种基于混合Transformer编码器的“RGB+深度”(RGBD)双模态语义分割模型DFST(depth-fusion semantic transformer),试验在真实温室光照情况下获得深度图像,对深度图像做HHA编码并结合彩色图像输入模型进行训练,经过HHA编码的深度图像可以作为一种辅助模态与RGB图像进行融合并进行特征提取,利用轻量化的多层感知机解码器对特征图进行解码,最终实现图像分割。试验结果表明,DFST模型在测试集的平均交并比可达96.99%,对比不引入深度图像的模型,其平均交并比提高了1.37个百分点;DFST模型对比使用卷积神经网络作为特征提取主干网络的RGBD语义分割模型Shape Conv,其平均交并比提高了2.43个百分点。结果证明,深度信息有助于提高彩色图像的语义分割精度,可以明显提高复杂场景语义分割的准确性和鲁棒性,同时也证明了Transformer结构作为特征提取网络在图像语义分割中也表现出了良好的性能,可为温室环境下的番茄图像语义分割任务提供解决方案和技术支持。Image semantic segmentation has been widely used in various applications,such as plant phenotyping,robot harvesting,and facility scene analysis.Periodic fruit status of tomato is required for phenotypic information,such as shape and color.Tomato can be one of the most important vegetable crops in greenhouse environments.However,manual sampling and detection fail to meet the requirements of high throughput and precision,due to the time-consuming,labor-intensity,and low efficiency.Computer vision can be expected for image semantic segmentation in recent years.This image segmentation has been frequently used to distinguish the crop fruits(foreground)and growth environment(background)in complex environments.It is still necessary to improve the accuracy of semantic segmentation in the complex environments of the greenhouse,for example,the uneven lighting in greenhouse environments,overlapping and occlusion between crop fruits and leaves,and the similarity in texture and color between immature crops and leaves.Traditional semantic segmentation of deep convolutional networks has been used only in the RGB modality of images for training.The accuracy of semantic segmentation can be achieved by the bottleneck using only RGB modality for training,with the continuous evolution of deep learning models.In this study,an"RGB+Depth"model of multimodal semantic segmentation(called DFST,depth-fusion Semantic Transformer)was proposed using a hybrid Transformer encoder(mix transformer encoder).Mit(mix transformer encoder)was adopted as the main feature extraction network of the DFST model.Mit was a Transformer encoder feature extraction backbone network more suitable for semantic segmentation.Compared with the ordinary Vision Transformers(ViTs),Mit shared the following advantages:1)A hierarchical Encoder structure was employed to output the multi-scale features.The Decoder was also combined to capture and optimize segmentation for both high-resolution coarse-and low-resolution fine-grained features;2)Computational complexity was redu

关 键 词:温室 作物 语义分割 注意力机制 设施环境 番茄图像 RGBD TRANSFORMER 

分 类 号:S126[农业科学—农业基础科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象