基于Transformer的零样本食品图像检测

Zero-Shot Food Image Detection Based on Transformer

作　　者：宋静茹闵巍庆周鹏飞饶全瑞盛国瑞[1] 杨延村[1] 王丽丽[1] 蒋树强[2,3] SONG Jingru;MIN Weiqing;ZHOU Pengfei;RAO Quanrui;SHENG Guorui;YANG Yancun;WANG Lili;JIANG Shuqiang(School of Information and Electrical Engineering,Ludong University,Yantai 264025,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;Key Lab of Intelligent Information Processing,Chinese Academy of Sciences,Beijing 100190,China)

机构地区：[1]鲁东大学信息与电气工程学院,山东烟台264025 [2]中国科学院计算技术研究所,北京100190 [3]智能信息处理重点实验室,北京100190

出　　处：《食品工业科技》2024年第22期18-26,共9页Science and Technology of Food Industry

基　　金：国家自然科学基金青年科学基金项目(61705098);国家自然科学基金面上项目(61872170);山东省自然科学基金项目(ZR2023MF031)。

摘　　要：食品检测作为食品计算的一项基本任务,能够对输入的食品图像进行定位和识别,在智慧食堂结算和饮食健康管理等食品应用领域发挥着至关重要的作用。然而在实际场景下,食品类别会不断更新,基于固定类别训练的食品检测器很难对未见过的食品类别进行精准的检测。为了解决这一问题,本文提出了一种零样本食品图像检测方法。首先,构建了一个基于Transformer的食品基元生成器,其中每个基元都包含与食品类别相关的细粒度属性,根据食品的特性,可以有选择地组装这些基元,以合成未见类特征。其次,为了给未见类的视觉特征更多约束,本文提出了一个视觉特征解纠缠的增强组件,将食品图像的视觉特征分解为语义相关特征和语义不相关特征,以此能更好地将食品类别的语义知识转移到其视觉特征。所提出的方法在ZSFooD和UEC-FOOD256两个食品数据集上进行了大量实验和消融研究,在零样本检测(Zero-Shot Detection,ZSD)设置下,未见类别取得了最优的平均精度,分别达到了4.9%和24.1%,在广义零样本检测(Generalized Zero-Shot Detection,GZSD)的设置下,可见类和未见类的调和平均值(Harmonic Mean,HM)分别达到了5.8%和22.0%,证明了所提出方法的有效性。As a fundamental task in food computing,food detection played a crucial role in locating and identifying food items from input images,particularly in applications such as intelligent canteen settlement and dietary health management.However,food categories were constantly updating in practical scenarios,making it difficult for food detectors trained on fixed categories to accurately detect previously unseen food categories.To address this issue,this paper proposed a zeroshot food image detection method.Firstly,a Transformer-based food primitive generator was constructed,where each primitive contained fine-grained attributes relevant to food categories.These primitives could be selectively assembled based on the food characteristics to synthesize new food features.Secondly,an enhancement component of visual feature disentanglement was proposed in order to impose more constraints on the visual features of unseen food categories.The visual features of food images were decomposed into semantically related features and semantically unrelated features,thereby better transferring semantic knowledge of food categories to their visual features.The proposed method was extensively evaluated on the ZSFooD and UEC-FOOD256 datasets through numerous experiments and ablation studies.Under the zero-shot detection(ZSD)setting,optimal average precision on unseen classes reached 4.9%and 24.1%,respectively,demonstrating the effectiveness of the proposed approach.Under the generalized zero-shot detection(GZSD)setting,the harmonic mean of visible and unseen classes reaches 5.8%and 22.0%,respectively,further validating the effectiveness of the proposed method.

关键词：食品图像检测零样本学习生成式模型 TRANSFORMER 深度学习

分类号：S126[农业科学—农业基础科学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Transformer的零样本食品图像检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Transformer的零样本食品图像检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索