检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Qingwen Liu Zejun Li Zhihao Fan Cunxiang Yin Yancheng He Jing Cai Jinhua Fu Zhongyu Wei
机构地区:[1]Institute of Science and Technology for Brain-Inspired Intelligence,Fudan University,Handan Road,Shanghai 200433,China [2]School of Data Science,Fudan University,Handan Road,Shanghai 200433,China [3]Tencent,Nanshan Strict,Shenzhen 518057,China
出 处:《Data Intelligence》2025年第1期143-162,共20页数据智能(英文)
基 金:supported by National Natural Science Foundation of China(No.62176058);National Key RD Program of China(2023YFF1204800).
摘 要:Event extraction extracts event frames from text, while grounded situation recognition detects events in images. As real-world applications frequently encounter a multitude of unforeseen events, certain researchers have introduced cross-domain and in-domain event extraction, while grounded situation recognition primarily explores in-domain scenarios. Therefore, in this paper, we propose cross-domain grounded situation recognition and establish a new benchmark SWiG-XD. In this more challenging setting, we deepen the connection between the two tasks based on their underlying unity in two different modalities and explore how to transfer the generalization ability from text to images. Firstly, we utilize ChatGPT to automatically generate textual data, which can be divided into two categories. One category is directly matched with images, establishing a direct connection with the images. The other category encompasses all event types and possesses greater generalization. Then we employ a unified model framework to establish the association between textual concepts and local image features and achieve cross-domain generalization transfer across modalities through modality-shared prompts and self-attention mechanism. Furthermore, we incorporate textual data with higher generalization to further assist in improving generalization on images. The experimental results on the newly constructed benchmark demonstrate the effectiveness of our method.
关 键 词:Event argument extraction Cross-domain generalization Unified cross-modal framework Modalityshared prompt Grounded situation recognition
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38