检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王辉[1,2] 黄宇廷 夏玉婷 范自柱 罗国亮 杨辉[2] WANG Hui;HUANG Yu-Ting;XIA Yu-Ting;FAN Zi-Zhu;LUO Guo-Liang;YANG Hui(School of Information and Software Engineering,East China Jiaotong University,Nanchang 330013;State Key Laboratory of Performance Monitoring and Protecting of Rail Transit Infrastructure,Nanchang 330013;School of Software Technology,Zhejiang University,Ningbo 315048;College of Computer Science and Technology,Shanghai University of Electric Power,Shanghai 201306)
机构地区:[1]华东交通大学信息与软件工程学院,南昌330013 [2]华东交通大学轨道交通基础设施性能监测与保障国家重点实验室,南昌330013 [3]浙江大学软件学院,宁波315048 [4]上海电力大学计算机科学与技术学院,上海201306
出 处:《自动化学报》2025年第2期445-456,共12页Acta Automatica Sinica
基 金:国家自然科学基金(61991401,U2034211,61991404);江西省自然科学基金(20224BAB212014,20232ABC03A04);CAD&CG国家重点实验室开放课题(A2334)资助。
摘 要:基于深度神经网络(Deep neutral networks,DNN)的分类方法因缺乏可解释性,导致在金融、医疗、法律等关键领域难以获得完全信任,极大限制了其应用.现有多数研究主要关注单模态数据的可解释性,多模态数据的可解释性方面仍存在挑战.为解决这一问题,提出一种基于视觉属性的多模态可解释图像分类方法,该方法将可见光和深度图等不同视觉模态提取的属性融入模型的训练过程,不仅能通过视觉属性和决策树对已有的神经网络黑盒模型进行解释,而且能在训练过程中进一步提升模型解释信息的能力.引入可解释性通常会造成模型精度的降低,该方法在保持模型具有良好可解释性的同时,仍具有较高的分类精度,在NYUDv2、SUN RGB-D和RGB-NIR三个数据集上,相比于单模态可解释方法,该模型准确率明显提升,并达到与多模态不可解释模型相媲美的性能.The classification methods based on deep neutral networks(DNN)lack interpretability,which makes it difficult to gain complete trust in key fields such as finance,medical treatment,and law,greatly limiting their applications.Most existing research mainly focuses on the interpretability of uni-modal data,while there are still challenges in the interpretability of multimodal data.To address this issue,a multimodal interpretable image classification method based on visual attributes is proposed.This method incorporates attributes extracted from different visual modalities such as visible light and depth maps into the training process of the model.It not only interpret the existing black box model of neural networks through visual attributes and decision trees,but also further enhances the model's ability to interpret information during the training process.Introducing interpretability often leads to a decrease in model accuracy.This method maintains good interpretability while still maintaining high classification accuracy.Compared to uni-modal interpretable methods,the accuracy of this model is significantly improved on the NYUDv2,SUN RGB-D,and RGB-NIR datasets,and it achieves performance comparable to multimodal uninterpretable models.
关 键 词:可解释性 视觉属性 多模态融合 决策树 图像分类
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.15.182.56