基于图提示的半监督开放词汇多标记学习  

Semi-supervised Open Vocabulary Multi-label Learning Based on Graph Prompting

在线阅读下载全文

作  者:李仲年 皇甫志宇 杨凯杰 营鹏 孙统风[1,3] 许新征 Li Zhongnian;Huangfu Zhiyu;Yang Kaijie;Ying Peng;Sun Tongfeng;Xu Xinzheng(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116;State Key Laboratory of CAD&CG(Zhejiang University),Hangzhou 310058;Mine Digitization Engineering Research Center(China University of Mining and Technology),Ministry of Education,Xuzhou,Jiangsu 221116)

机构地区:[1]中国矿业大学计算机科学与技术学院,江苏徐州221116 [2]计算机辅助设计与图形系统全国重点实验室(浙江大学),杭州310058 [3]矿山数字化教育部工程研究中心(中国矿业大学),江苏徐州221116

出  处:《计算机研究与发展》2025年第2期432-442,共11页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61976217,62306320);江苏省自然科学基金项目(BK20231063);浙江大学计算机辅助设计与图形系统全国重点实验开放课题(A2424);中国矿业大学研究生创新计划项目(2024WLJCRCZL262)。

摘  要:半监督多标记学习利用有标记数据和无标记数据进行模型的训练,降低了多标记数据的标记成本并取得了不错的结果,吸引了很多研究者不断进行研究.然而,在半监督标注过程中,由于标记的数量较多,往往会出现某些标记缺失标注样本的情况,这些标记被称为开放词汇.开放词汇会导致模型无法学习到该类别的标记信息,使得模型性能下降. 针对上述问题,提出了基于图提示的半监督开放词汇多标记 学习方法. 具体地,该方法利用基于提示的图神经网络对预训练大模型进行微调,挖掘和探索开放词汇与 监督样本之间的关系. 通过使用包含图像与文本的多模态数据构造图神经网络作为预训练大模型的文本 输入进行学习. 其次利用预训练大模型在开放词汇上的泛化能力,对无监督样本生成伪标记,实现对输出 分类层的微调,使模型在对开放词汇进行分类时能获得更加理想的效果. 多个基准数据集上的实验结果 均显示,基于图提示的半监督开放词汇多标记学习方法优于目前的主流方法,在 VOC,COCO,CUB,NUS 等基准数据集上均取得了最优的效果.Semi-supervised multi-label learning employs labeled and unlabeled data to train a model,which effectively achieves good results and reduces the labeling cost of multi-label data.Therefore,semi-supervised multilabel learning has attracted many researchers dedicated to this field.However,in the semi-supervised annotation process,due to the large number of labels,it is a common situation that some labels lack of samples,and these labels are called open vocabulary.It is difficult for the model to learn the label information of the open vocabulary,which leads to the degradation of its performance.To address the above problem,we propose a semi-supervised open vocabulary multi-label learning method based on graph prompting.Specifically,this method uses a graph neural network via prompt to fine-tune the pre-trained model and explore the relationship between open vocabulary and supervised samples.By using images and text,we construct a graph neural network as the input of text for the pretrained model.Furthermore,by leveraging the generalization ability of the pre-trained model on open vocabulary,pseudo-labels are generated for unsupervised samples.Then we use pseudo-labels to train the classification layer and to enable the model to achieve better performance in classifying open vocabulary.Experimental results on multiple benchmark datasets,including VOC,COCO,CUB,and NUS,consistently demonstrate that the proposed method outperforms existing methods and achieves state-of-the-art performance.

关 键 词:半监督多标记学习 预训练模型 图神经网络 开放词汇 提示 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象