Multi-Scale Feature Fusion and Advanced Representation Learning for Multi Label Image Classification  

作  者:Naikang Zhong Xiao Lin Wen Du Jin Shi 

机构地区:[1]Institute of Artificial Intelligence on Education Research,College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai,200234,China [2]Lab for Educational Big Data and Policymaking,Ministry of Education,Shanghai Normal University,Shanghai,200234,China [3]Shanghai Intelligent Education Big Data Engineering Technology Research Center,Shanghai Normal University,Shanghai,200234,China [4]Shanghai Online Education Research Base for Primary and Secondary Schools,Shanghai,200234,China [5]DS Information Technology Co.,Ltd.,Shanghai,200032,China [6]Faculty of Innovation Engineering,Macao university of Science and Technology,Macao,999078,China

出  处:《Computers, Materials & Continua》2025年第3期5285-5306,共22页计算机、材料和连续体(英文)

基  金:supported by the National Natural Science Foundation of China(62302167,62477013);Natural Science Foundation of Shanghai(No.24ZR1456100);Science and Technology Commission of Shanghai Municipality(No.24DZ2305900);the Shanghai Municipal Special Fund for Promoting High-Quality Development of Industries(2211106).

摘  要:Multi-label image classification is a challenging task due to the diverse sizes and complex backgrounds of objects in images.Obtaining class-specific precise representations at different scales is a key aspect of feature representation.However,existing methods often rely on the single-scale deep feature,neglecting shallow and deeper layer features,which poses challenges when predicting objects of varying scales within the same image.Although some studies have explored multi-scale features,they rarely address the flow of information between scales or efficiently obtain class-specific precise representations for features at different scales.To address these issues,we propose a two-stage,three-branch Transformer-based framework.The first stage incorporates multi-scale image feature extraction and hierarchical scale attention.This design enables the model to consider objects at various scales while enhancing the flow of information across different feature scales,improving the model’s generalization to diverse object scales.The second stage includes a global feature enhancement module and a region selection module.The global feature enhancement module strengthens interconnections between different image regions,mitigating the issue of incomplete represen-tations,while the region selection module models the cross-modal relationships between image features and labels.Together,these components enable the efficient acquisition of class-specific precise feature representations.Extensive experiments on public datasets,including COCO2014,VOC2007,and VOC2012,demonstrate the effectiveness of our proposed method.Our approach achieves consistent performance gains of 0.3%,0.4%,and 0.2%over state-of-the-art methods on the three datasets,respectively.These results validate the reliability and superiority of our approach for multi-label image classification.

关 键 词:Image classification MULTI-LABEL multi scale attention mechanisms feature fusion 

分 类 号:G63[文化科学—教育学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象