基于图文鲁棒性表征的社交媒体多模态命名实体识别

Multimodal named entity recognition in social media based on graphic robust representation

作　　者：袁一铭郭军军余正涛[1,2] YUAN Yiming;GUO Junjun;YU Zhengtao(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)

机构地区：[1]昆明理工大学信息工程与自动化学院,云南昆明650500 [2]昆明理工大学云南省人工智能重点实验室,云南昆明650500

出　　处：《微电子学与计算机》2025年第2期50-58,共9页Microelectronics & Computer

基　　金：国家自然科学基金(62366025);云南省重大专项(202202AE090008-3);云南省科技厅自然科学基金(202301AT070444)。

摘　　要：多模态命名实体识别(Multimodal Named Entity Recognition,MNER)旨在融合视觉图像信息提高文本实体识别的性能。以往的MNER研究主要集中多模态融合方法上,然而,文本和其对应的图像可能不完全匹配,而图文对齐噪声通常不可避免,不相关的图像区域可能会误导文本信息,导致模型性能下降。为此,本文提出了一种基于跨模态语义交互掩码模型(Cross-Modal Semantic Interaction Mask model,CMSIM)的噪声鲁棒MNER方法。该方法通过跨模态交互掩码机制构建文本-图像关系感知注意mask矩阵,并基于文本-图像交互掩码过滤视觉噪声信息并融合鲁棒图文特征,从而提升命名实体识别的性能。在两个公开数据集上测试结果表明,该模型能够提升MNER任务实体识别的准确率,证明了所提方法的有效性。Multimodal Named Entity Recognition(MNER)aims to enhance the recognition of textual entities by incorporating visual image information.Previous research in MNER has primarily focused on multimodal fusion methods.However,text and its corresponding images may not always align perfectly,and the presence of noise in text-image alignment is often unavoidable.Irrelevant regions in images may mislead textual information,leading to a decline in model performance.To address this issue,this paper proposes a noise-robust MNER approach based on the Cross-Modal Semantic Interaction Mask model(CMSIM).Specifically,this method constructs a text-image relationship-aware attention mask matrix through a cross-modal interaction mask mechanism.It filters out visual noise information based on the text-image interaction mask and integrates robust textual and visual features to enhance named entity recognition performance.Experimental results on two publicly available datasets demonstrate that the proposed model can improve the accuracy of entity recognition tasks in MNER,thus validating the effectiveness of the proposed approach.

关键词：多模态命名实体识别模态表征对齐噪声双向交互注意力机制

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于图文鲁棒性表征的社交媒体多模态命名实体识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于图文鲁棒性表征的社交媒体多模态命名实体识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索