Debiasing vision-language models for vision tasks:a survey  

作  者:Beier ZHU Hanwang ZHANG 

机构地区:[1]School of Computer Science and Engineering,Nanyang Technological University,Singapore 639798,Singapore

出  处:《Frontiers of Computer Science》2025年第1期175-177,共3页计算机科学前沿(英文版)

摘  要:1 Introduction In recent years,foundation Vision-Language Models(VLMs),such as CLIP[1],which empower zero-shot transfer to a wide variety of domains without fine-tuning,have led to a significant shift in machine learning systems.Despite the impressive capabilities,it is concerning that the VLMs are prone to inheriting biases from the uncurated datasets scraped from the Internet[2–5].We examine these biases from three perspectives.(1)Label bias,certain classes(words)appear more frequently in the pre-training data.(2)Spurious correlation,non-target features,e.g.,image background,that are correlated with labels,resulting in poor group robustness.(3)Social bias,which is a special form of spurious correlation,focuses on societal harm.Unaudited image-text pairs might contain human prejudice,e.g.,gender,ethnicity,and age,that are correlated with targets.These biases are subsequently propagated to downstream tasks,leading to biased predictions.

关 键 词:TASKS IMAGE tuning 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象