一种基于交叉注意力机制的跨模态视频-文本检索模型  

A Cross-modal Video-Text Retrieval Model Based on the Cross-attention Mechanism

在线阅读下载全文

作  者:王盛[1] 宋向辉[2] 胡世雄 梁营力 孙晓亮 Wang Sheng;Song Xianghui;Hu Shixiong;Liang Yingli;Sun Xiaoliang(Information Engineering University,Henan,Zhengzhou,450001;Highway Science Research Institute,Ministry of Transport,Beijing,100088;Yellow River Institute of Communications,Henan,Jiaozuo,454950;Henan Zhonggong Design&Research Institute Group Co.,Ltd.,Henan,Zhengzhou,450018;Beijing Zhongjiao Guotong Intelligent Transportation System Technology Co.,Ltd.,Beijing,100082)

机构地区:[1]信息工程大学,河南郑州450001 [2]交通运输部公路科学研究院,北京100088 [3]黄河交通学院,河南焦作454950 [4]河南省中工设计研究院集团有限公司,河南郑州450018 [5]北京中交国通智能交通系统技术有限公司,北京100082

出  处:《安全、健康和环境》2025年第3期20-26,共7页Safety Health & Environment

基  金:国家自然科学基金(面上项目)(62272480),黑灰产网络资产图谱可视分析关键技术研究。

摘  要:在危险品运输的安全规划任务中,准确识别交通事故诱因至关重要。现有方法通常依赖交通事故报告、交通监控视频和其他文本数据的结合分析,但存在跨模态数据检索精度和效率不高的问题。为此,提出了一种基于交叉注意力机制的跨模态检索模型,旨在提升危险品运输事故分析过程中的跨模态数据检索性能。该模型通过融合交通监控视频与事故报告等文本数据,利用交叉注意力机制有效地提取视频-文本之间的对应关系,以提高检索的准确性与效率。模型架构包括数据预处理、特征提取、交叉注意力机制、多模态特征融合、精细化相似度计算和优化损失函数。实验结果表明,提出的模型所有评估指标都超越了基准模型,如在Recall@5上超过了基准模型(HiT)2.53%,显著优于对比语言-图像预训练(CLIP)等现有跨模态数据检索方法,消融实验进一步验证了交叉注意力机制在提高检索精度和效率中的关键作用。该研究为危险品运输的安全规划与事故预防提供了有力支持。In the task of safety planning of dangerous goods transportation,it is very important to accurately identify the cause of traffic accidents.The existing methods usually rely on the combination analysis of traffic accident report,traffic surveillance video and other text data,but the accuracy and efficiency of cross-modal data retrieval are not high.Therefore,a cross-modal retrieval model based on cross-attention mechanism was proposed to improve the performance of cross-modal data retrieval in the process of dangerous goods transport accident analysis.The model integrated text data such as traffic surveillance video and accident report,and used cross-attention mechanism to extract the corresponding relationship between video and text effectively,so as to improve the accuracy and efficiency of retrieval.The model architecture included data preprocessing,feature extraction,cross-attention mechanism,multi-modal feature fusion,fine similarity calculation and optimization loss function.The experimental results showed that the proposed model outperformed the best benchmark model(HiT)by 1.3%in the retrieval tasks on the dangerous goods transport dataset Recall@1 and Recall@5,which was significantly better than the existing cross-modal data retrieval methods such as CLIP.The ablation experiment further verified the key role of cross-attention mechanism in improving retrieval accuracy and efficiency.This study provided strong support for the safety planning and accident prevention of dangerous goods transportation.

关 键 词:危险品运输 跨模态检索 交通监控 交叉注意力机制 事故分析 任务规划 

分 类 号:TP181.5[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象