铁路行业视觉大模型构建及应用  

Development and Application of a Large Vision Model for Railway Industry

作  者:代明睿 李文浩 史维峰 李国华 杨涛存 杜文然 DAI Mingrui;LI Wenhao;SHI Weifeng;LI Guohua;YANG Taocun;DU Wenran(Institute of Computing Technology,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China)

机构地区:[1]中国铁道科学研究院集团有限公司电子计算技术研究所,北京100081

出  处:《中国铁路》2025年第1期1-12,共12页China Railway

基  金:中国国家铁路集团有限公司科技研究开发计划项目(P2023S001)。

摘  要:铁路领域的视觉应用场景往往具有场景复杂多变、有效样本量少等困难,单独设计面向各场景的小模型耗费大量时间精力且难以满足业务要求,因此构建铁路行业视觉大模型具有重要意义。研究挖掘视觉大模型的潜在应用场景,提出铁路视觉大模型构建方案,基于UPerNet网络,利用InternImage代替原主干网络,更好地捕捉图像目标细节,并将Semantic-Aware Nor⁃malization(SAN)与Semantic-Aware Whitening(SAW)注意力机制代替原金字塔池化模块,提升模型整体鲁棒性,将空间注意力与通道注意力融合代替原解码部分,实现动态地调整对不同区域的关注度,最后通过半自动化标注构建一批铁路场景数据集。实验结果表明,研究提出的改进的UPerNet_InternImage铁路行业视觉大模型在提高分割效果的准确性和鲁棒性方面具有一定潜力,并在面对后续具体场景的分割任务时,收敛速度更快、模型效果更好,为解决铁路视觉场景中的问题提供了新的思路和方法。Vision application in the railway sector often face challenges such as complex and dynamic scenarios,coupled with limited number of effective samples.Designing small models for each specific scenario can be time-consuming and resource-intensive,and it is difficult to meet the diverse business needs.Therefore,developing large vision models specifically for the railway industry holds significant importance.This paper examines and explores potential application scenarios for large vision models within the railway sector,proposing a solution for their development.The research builds upon the UPerNet network,utilizing InternImage to replace the original backbone network,thereby enhancing the model’s ability to capture details of image targets.To further improve model robustness,Semantic-Aware Normalization(SAN)and Semantic-Aware Whitening(SAW)attention mechanisms are introduced in place of the original pyramid pooling module.Additionally,the integration of spatial attention and channel attention replaces the original decoding part,allowing for dynamic adjustments to attention across various regions.Finally,datasets for railway scenarios were established through semi-automatic annotation.The test results indicate that the improved UPerNet_InternImage large vision model proposed for the railway industry has potential to enhance the segmentation accuracy and robustness.The model exhibits faster convergence speeds and improved effectiveness when tackling segmentation tasks in specific railway scenarios.It offers new insights and methodologies for addressing issues prevalent in railway vision scenarios.

关 键 词:人工智能 可变形卷积 注意力机制 语义分割 视觉大模型 铁路行业大模型 

分 类 号:U29-39[交通运输工程—交通运输规划与管理] TP18[交通运输工程—道路与铁道工程] TP391.4[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象