How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges  被引量:1

在线阅读下载全文

作  者:Haotong Qin Ge-Peng Ji Salman Khan Deng-Ping Fan Fahad Shahbaz Khan Luc Van Gool 

机构地区:[1]Computer Vision Lab(CVL),ETH Zurich,Zurich 8001,Switzerland [2]College of Engineering,Computing&Cybernetics,Australian National University,Canberra 8105,Australia [3]Mohamed bin Zayed University of Artificial Intelligence,Abu Dhabi 999041,UAE

出  处:《Machine Intelligence Research》2023年第5期605-613,共9页机器智能研究(英文版)

摘  要:Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI.Notably,Bard has recently been updated to handle visual inputs alongside text prompts during conversations.Given Bard's impressive track record in handling textual inputs,we explore its capabilities in understanding and interpreting visual data(images)conditioned by text questions.This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Generative models,especially in addressing complex computer vision problems that demand accurate visual and language understanding.Specifically,in this study,we focus on 15 diverse task scenarios encompassing regular,camouflaged,medical,under-water and remote sensing data to comprehensively evaluate Bard's performance.Our primary finding indicates that Bard still struggles in these vision scenarios,highlighting the significant gap in vision-based understanding that needs to be bridged in future developments.We expect that this empirical study will prove valuable in advancing future models,leading to enhanced capabilities in comprehending and interpreting finegrained visual data.Our project is released on https://github.com/htqin/GoogleBard-VisUnderstand.

关 键 词:Google Bard multi-modal understanding visual comprehension large language models conversational AI chatbot. 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象