Bridging the Domain Gap in Grounded Situation Recognition via Unifying Event Extraction across Modalities  

在线阅读下载全文

作  者:Qingwen Liu Zejun Li Zhihao Fan Cunxiang Yin Yancheng He Jing Cai Jinhua Fu Zhongyu Wei 

机构地区:[1]Institute of Science and Technology for Brain-Inspired Intelligence,Fudan University,Handan Road,Shanghai 200433,China [2]School of Data Science,Fudan University,Handan Road,Shanghai 200433,China [3]Tencent,Nanshan Strict,Shenzhen 518057,China

出  处:《Data Intelligence》2025年第1期143-162,共20页数据智能(英文)

基  金:supported by National Natural Science Foundation of China(No.62176058);National Key RD Program of China(2023YFF1204800).

摘  要:Event extraction extracts event frames from text, while grounded situation recognition detects events in images. As real-world applications frequently encounter a multitude of unforeseen events, certain researchers have introduced cross-domain and in-domain event extraction, while grounded situation recognition primarily explores in-domain scenarios. Therefore, in this paper, we propose cross-domain grounded situation recognition and establish a new benchmark SWiG-XD. In this more challenging setting, we deepen the connection between the two tasks based on their underlying unity in two different modalities and explore how to transfer the generalization ability from text to images. Firstly, we utilize ChatGPT to automatically generate textual data, which can be divided into two categories. One category is directly matched with images, establishing a direct connection with the images. The other category encompasses all event types and possesses greater generalization. Then we employ a unified model framework to establish the association between textual concepts and local image features and achieve cross-domain generalization transfer across modalities through modality-shared prompts and self-attention mechanism. Furthermore, we incorporate textual data with higher generalization to further assist in improving generalization on images. The experimental results on the newly constructed benchmark demonstrate the effectiveness of our method.

关 键 词:Event argument extraction Cross-domain generalization Unified cross-modal framework Modalityshared prompt Grounded situation recognition 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象