视觉语言导航研究进展被引量：2

Recent Advances in Vision-and-language Navigation

作　　者：司马双霖黄岩[1,3] 何科技安东袁辉王亮[1,2,3,4,5] SIMA Shuang-Lin;HUANG Yan;HE Ke-Ji;AN Dong;YUAN Hui;WANG Liang(Center of Research on Intelligent Perception and Computing,Institute of Automation,Chinese Academy of Sciences,Beijing 100190;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049;National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190;Center for Excellence in Brain Science and Intelligence Technology,Institute of Automation,Chinese Academy of Sciences,Shanghai 200031;Artificial Intelligence Research,Chinese Academy of Sciences,Jiaozhou 266300)

机构地区：[1]中国科学院自动化研究所智能感知与计算研究中心,北京100190 [2]中国科学院大学人工智能学院,北京100049 [3]中国科学院自动化研究所模式识别国家重点实验室,北京100190 [4]中国科学院自动化研究所脑科学与智能技术卓越创新中心,上海200031 [5]中科人工智能创新技术研究院,胶州266300

出　　处：《自动化学报》2023年第1期1-14,共14页Acta Automatica Sinica

摘　　要：视觉语言导航,即在一个未知环境中,智能体从一个起始位置出发,结合指令和周围视觉环境进行分析,并动态响应生成一系列动作,最终导航到目标位置.视觉语言导航有着广泛的应用前景,该任务近年来在多模态研究领域受到了广泛关注.不同于视觉问答和图像描述生成等传统多模态任务,视觉语言导航在多模态融合和推理方面,更具有挑战性.然而由于传统模仿学习的缺陷和数据稀缺的现象,模型面临着泛化能力不足的问题.系统地回顾了视觉语言导航的研究进展,首先对于视觉语言导航的数据集和基础模型进行简要介绍;然后全面地介绍视觉语言导航任务中的代表性模型方法,包括数据增强、搜索策略、训练方法和动作空间四个方面;最后根据不同数据集下的实验,分析比较模型的优势和不足,并对未来可能的研究方向进行了展望.Vision-and-language navigation means that an agent in an unknown environment,starting from a starting location,dynamically generates a series of actions by making analysis with language instructions and the visual environment,and finally navigates to the goal location.And due to the widespread application prospect,in recent years,it has received increasing attention from researchers especially in multi-modal research.It is different from traditional multi-modal tasks such as vision question answer and image captioning,vision-and-language navigation is more challenging in terms of dynamic reasoning and multi-modal fusion.However,with the limitations of imitation learning and the phenomenon of data scarcity,the model is faced with the problem of insufficient generalization.In this paper,we review the current advances in the research of vision-and-language navigation.Firstly,we briefly introduce data sets in visual-and-language navigation.Then,we comprehensively introduce the representative models in vision-and-language navigation,including data augmentation,search strategies,training methods and action spaces.Finally,from the experiments under different data sets,we analyze the advantages and disadvantages of the existing models,and prospect some future and possible research directions.

关键词：视觉语言导航视觉语言理解跨模态匹配具身智能

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

视觉语言导航研究进展被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

视觉语言导航研究进展 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

视觉语言导航研究进展被引量：2