基于视觉属性的多模态可解释图像分类方法

Multimodal Interpretable Image Classification Method Based on Visual Attributes

作　　者：王辉[1,2] 黄宇廷夏玉婷范自柱罗国亮杨辉[2] WANG Hui;HUANG Yu-Ting;XIA Yu-Ting;FAN Zi-Zhu;LUO Guo-Liang;YANG Hui(School of Information and Software Engineering,East China Jiaotong University,Nanchang 330013;State Key Laboratory of Performance Monitoring and Protecting of Rail Transit Infrastructure,Nanchang 330013;School of Software Technology,Zhejiang University,Ningbo 315048;College of Computer Science and Technology,Shanghai University of Electric Power,Shanghai 201306)

机构地区：[1]华东交通大学信息与软件工程学院,南昌330013 [2]华东交通大学轨道交通基础设施性能监测与保障国家重点实验室,南昌330013 [3]浙江大学软件学院,宁波315048 [4]上海电力大学计算机科学与技术学院,上海201306

出　　处：《自动化学报》2025年第2期445-456,共12页Acta Automatica Sinica

基　　金：国家自然科学基金(61991401,U2034211,61991404);江西省自然科学基金(20224BAB212014,20232ABC03A04);CAD&CG国家重点实验室开放课题(A2334)资助。

摘　　要：基于深度神经网络(Deep neutral networks,DNN)的分类方法因缺乏可解释性,导致在金融、医疗、法律等关键领域难以获得完全信任,极大限制了其应用.现有多数研究主要关注单模态数据的可解释性,多模态数据的可解释性方面仍存在挑战.为解决这一问题,提出一种基于视觉属性的多模态可解释图像分类方法,该方法将可见光和深度图等不同视觉模态提取的属性融入模型的训练过程,不仅能通过视觉属性和决策树对已有的神经网络黑盒模型进行解释,而且能在训练过程中进一步提升模型解释信息的能力.引入可解释性通常会造成模型精度的降低,该方法在保持模型具有良好可解释性的同时,仍具有较高的分类精度,在NYUDv2、SUN RGB-D和RGB-NIR三个数据集上,相比于单模态可解释方法,该模型准确率明显提升,并达到与多模态不可解释模型相媲美的性能.The classification methods based on deep neutral networks(DNN)lack interpretability,which makes it difficult to gain complete trust in key fields such as finance,medical treatment,and law,greatly limiting their applications.Most existing research mainly focuses on the interpretability of uni-modal data,while there are still challenges in the interpretability of multimodal data.To address this issue,a multimodal interpretable image classification method based on visual attributes is proposed.This method incorporates attributes extracted from different visual modalities such as visible light and depth maps into the training process of the model.It not only interpret the existing black box model of neural networks through visual attributes and decision trees,but also further enhances the model's ability to interpret information during the training process.Introducing interpretability often leads to a decrease in model accuracy.This method maintains good interpretability while still maintaining high classification accuracy.Compared to uni-modal interpretable methods,the accuracy of this model is significantly improved on the NYUDv2,SUN RGB-D,and RGB-NIR datasets,and it achieves performance comparable to multimodal uninterpretable models.

关键词：可解释性视觉属性多模态融合决策树图像分类

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于视觉属性的多模态可解释图像分类方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于视觉属性的多模态可解释图像分类方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索