Event-Driven Attention Network:A Cross-Modal Framework for Efficient Image-Text Retrieval in Mass Gathering Events

作　　者：Kamil Yasen Heyan Jin Sijie Yang Li Zhan Xuyang Zhang Ke Qin Ye Li

机构地区：[1]School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu,611731,China [2]School of Information and Software Engineering,University of Electronic Science and Technology ofChina,Chengdu,611731,China [3]Kashi Institute of Electronics and Information Industry,Kashi,844508,China

出　　处：《Computers, Materials & Continua》2025年第5期3277-3301,共25页计算机、材料和连续体(英文)

基　　金：sponsored by Natural Science Foundation of Xinjiang Uygur Autonomous Region(2024D01A19).

摘　　要：Research on mass gathering events is critical for ensuring public security and maintaining social order.However,most of the existing works focus on crowd behavior analysis areas such as anomaly detection and crowd counting,and there is a relative lack of research on mass gathering behaviors.We believe real-time detection and monitoring of mass gathering behaviors are essential formigrating potential security risks and emergencies.Therefore,it is imperative to develop a method capable of accurately identifying and localizing mass gatherings before disasters occur,enabling prompt and effective responses.To address this problem,we propose an innovative Event-Driven Attention Network(EDAN),which achieves image-text matching in the scenario of mass gathering events with good results for the first time.Traditional image-text retrieval methods based on global alignment are difficult to capture the local details within complex scenes,limiting retrieval accuracy.While local alignment-based methods aremore effective at extracting detailed features,they frequently process raw textual features directly,which often contain ambiguities and redundant information that can diminish retrieval efficiency and degrade model performance.To overcome these challenges,EDAN introduces an Event-Driven AttentionModule that adaptively focuses attention on image regions or textual words relevant to the event type.By calculating the semantic distance between event labels and textual content,this module effectively significantly reduces computational complexity and enhances retrieval efficiency.To validate the effectiveness of EDAN,we construct a dedicated multimodal dataset tailored for the analysis of mass gathering events,providing a reliable foundation for subsequent studies.We conduct comparative experiments with other methods on our dataset,the experimental results demonstrate the effectiveness of EDAN.In the image-to-text retrieval task,EDAN achieved the best performance on the R@5 metric,while in the text-to-image retrieval task,it show

关键词：Mass gathering events image-text retrieval attention mechanism

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Event-Driven Attention Network:A Cross-Modal Framework for Efficient Image-Text Retrieval in Mass Gathering Events

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Event-Driven Attention Network:A Cross-Modal Framework for Efficient Image-Text Retrieval in Mass Gathering Events

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索