BSGAN-GP:类别均衡驱动的半监督图像识别模型  

BSGAN-GP:a semi-supervised image recognition model driven by class balancing

在线阅读下载全文

作  者:胡静[1] 张汝敏 连炳全 Hu Jing;Zhang Rumin;Lian Bingquan(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)

机构地区:[1]太原科技大学计算机科学与技术学院,太原030024

出  处:《中国图象图形学报》2025年第1期95-109,共15页Journal of Image and Graphics

基  金:国家自然科学基金项目(32071775);山西省自然科学基金项目(202203021211189);企业委托横向项目(2021035)。

摘  要:目的已有的深度学习图像识别模型严重依赖于大量专业人员手工标记的数据,这些专业图像标签信息难以获取,人工标记代价昂贵。实际场景中的数据集大多具有不平衡性,正负样本偏差严重导致模型在拟合时常偏向多数类,对少数类的识别精度不足。这严重阻碍了深度学习在实际图像识别中的广泛应用。方法结合半监督生成式对抗网络(semi-supervised generative adversarial netowrk)提出了一种新的平衡模型架构BSGAN-GP(balancing semi-supervised generative adversarial network-gradient penalty),使得半监督生成式对抗网络的鉴别器可以公平地判别每一个类。其中,提出的类别均衡随机选择算法(class balancing random selection,CBRS)可以解决图像样本类别不均导致少数类识别准确度低的问题。将真实数据中有标签数据按类别随机选择,使得输入的有标签数据每个类别都有相同的数量,然后将训练后参数固定的生成器NetG生成每个类同等数量的假样本输入鉴别器,更新鉴别器NetD保证了鉴别器可以公平地判别所有类;同时BSGAN-GP在鉴别器损失函数中添加了一个额外的梯度惩罚项,使得模型训练更稳定。结果实验在3个主流数据集上与9种图像识别方法(包含6种半监督方法和3种全监督方法)进行了比较。为了证明对少数类的识别准确度提升,制定了3个数据集的不平衡版本。在Fashion-MNIST数据集中,相比于基线模型,总体准确率提高了3.281%,少数类识别率提升了7.14%;在MNIST数据集中,相比于基线模型,对应的4个少数类识别率提升了2.68%~7.40%;在SVHN(street view house number)数据集中,相比于基线模型,总体准确率提高了3.515%。同时也在3个数据集中进行了合成图像质量对比以验证CBRS算法的有效性,其少数类合成图像质量以及数量的提升证明了其效果。消融实验评估了所提出模块CBRS与引进模块在网络中的重要性。结论本�Objective Image classification technology has realized high-precision automatic classification and screening of digital images with the improvement of algorithm performance and the development of computer hardware.This technology uses a computer to conduct a quantitative analysis of the image,classifying each area in the image or image into one of sev⁃eral categories to replace human visual interpretation.However,in practice,a large number of training samples and highquality annotation information are required for high-quality training to obtain high-accuracy classification results.For largescale image datasets,existing image annotation methods need to be performed manually by industry experts,such as poly⁃gon annotation and key point annotation.As a result of the high cost of expert annotation and the difficulty of high-quality annotation,less image data are labeled,thus seriously hindering the development of deep learning in computer vision.To this end,the semi-supervised generative adversarial network(GAN)paradigm is proposed because it can use a large amount of unlabeled data to obtain the distribution characteristics of real samples in the feature space and more accurately determine the classification boundaries.The generative semi-supervised GAN model,such as DCGAN and semisupervised GAN,can create new samples and increase sample diversity,thus being more widely used in various fields.However,this model is often unstable in adversarial training;especially on an unbalanced dataset,the gradient can easily fall into the trap of predicting most of the data.Image datasets in real-world industrial applications are often categoryunbalanced,which is why this imbalance negatively affects the accuracy of mining classifiers.Several recent studies have revealed the effectiveness of GAN,such as DAGAN,BSSGAN,BAGAN,and improve-BAGAN,in alleviating the prob⁃lem of imbalance.Among them,BAGAN acts as an enhancement method to recover the balance in unbalanced datasets,which can learn useful features from most classes an

关 键 词:深度学习 半监督学习(SSL) 生成式对抗网络(GAN) 不平衡性图像识别 梯度惩罚 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象