检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:康萍萍 侯进[2,3] 周浩然 陈子锐 李晨[1,3] KANG Pingping;HOU Jing;ZHOU Haoran;CHEN Zirui;LI Chen(School of Computer and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756,China;Laboratory of Intelligent Perception and Smart Operation&Maintenance,School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China;National Engineering Laboratory of Integrated Transportation Big Data Application Technology,Southwest Jiaotong University,Chengdu 611756,China)
机构地区:[1]西南交通大学计算机与人工智能学院,四川成都611756 [2]西南交通大学信息科学与技术学院智能感知智慧运维实验室,四川成都611756 [3]西南交通大学综合交通大数据应用技术国家工程实验室,四川成都611756
出 处:《微电子学与计算机》2022年第5期10-19,共10页Microelectronics & Computer
基 金:四川省科技计划项目(2020SYSY0016);国家重点研发计划(2020YFB1711902)。
摘 要:针对传统多标签图像分类模型存在难以生成更接近相关标签的高层图像特征,以及因未能利用标签之间的视觉相关性而导致的识别精度不够高等问题,提出了一种基于空间注意力与图卷积的多标签图像分类算法.首先,利用图卷积网络学习标签邻接图特征和使用GLOVE算法,从标签序列获取的标签嵌入;其次,在高层语义信息中引入改进的空间注意力网络以对特定类别的语义特征进行重标定,实现背景和干扰信息的抑制;最后,在基于共现特征融合的分类器中,整合高层语义信息与图卷积网络提取的标签共现特征,采用通道一对一的方式完成模型最终预测.在两个公开数据集上进行实验表明,该算法在MS-COCO和VOC-2007数据集上的平均精度分别为81.42%和94.3%,较基础的MLGCN网络分别提升了1.13和1.3个百分点,且模型参数量仅为原模型的八分之一,训练过程中需要的迭代次数也远少于原模型,极大程度地降低了其训练成本.For traditional multi-label image classification models,it is difficult to generate high-level image features that are closer to related labels,and the visual correlation between the labels is not used,which leads to problems such as insufficient recognition accuracy.A multi-label image classification algorithm based on spatial attention and graph convolutionis proposed in this paper.Firstly,the graph convolutional network is used to learn the features of the label adjacency graph and the GLOVE algorithm is usedto obtain the label embedding from the label sequence.Secondly,an improved spatial attention networkis introducedin the high-level semantic information to re-calibrate the semantic features of a specific category and suppress background and interference information.Finally,the high-level semantic information with the tags extracted by the graph convolutional network in the classifier based on co-occurrence feature fusionare integrated,and the final prediction of the modelis completed in the channel one-to-one method.Experiments on two public data sets show that the average accuracy of the proposedalgorithmon the MS-COCO and VOC-2007 data sets are 81.42% and 94.3%,which are 1.13 and 1.3 percentage points higher than the basic MLGCN.The amount of model parameters is only one-eighth of the original model,and the number of iterations required in the training process is far less than that of the original model,which greatly reduces its training cost.
关 键 词:图卷积网络 多标签图像分类 空间注意力 特征融合
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222