检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈伟[1] 简川霞[2] CHEN Wei;JIAN Chuan-xia(School of Art,Ningbo City College of Vocational Technology,Ningbo 315100,China;College of Electromechanical Engineering,Guangdong University of Technology,Guangzhou 510006,China)
机构地区:[1]宁波城市职业技术学院艺术学院,宁波315100 [2]广东工业大学机电工程学院,广州510006
出 处:《数字印刷》2022年第2期52-60,共9页Digital Printing
基 金:浙江省教育厅科研项目资助(No.Y202147591);广东省信息物理融合系统重点实验室项目(No.2016B030301008);广东工业大学青年基金重点项目(No.17QNZD001);大学生创新创业训练项目(No.yj202111845031)。
摘 要:针对有标记的训练样本数量较少会降低印刷套准识别模型性能的问题,本研究提出了基于安全样本过采样预处理和协同训练的半监督方法,以提升识别模型的性能。首先采用k近邻方法识别训练集中的安全样本。在安全样本间进行过采样,生成新的训练集。然后采用Bootstrap采样方法将新的训练集分成三个子训练集,学习得到三个决策树子分类模型,不断对无标记样本进行预测,并将其加入到子训练集,更新子分类模型,直至其能稳定为止。集成子分类模型,形成最终分类模型。实验结果表明,本研究方法随着训练样本数量的增多,分类性能也逐渐提高。当训练样本数量为800时,其在测试集上的分类准确率Accuracy达到98%,召回率的几何平均数G-mean为99%,在同样数量的训练样本上,均高于实验中的其他方法。本研究方法可以有效利用无标记样本,提高印刷套准识别模型的性能,实现数量较少的训练集样本的印刷套准识别。A small number of labeled samples are utilized to train models for identifying printing registration,which degrades severely the model performance.To solve this problem,in this study,a novel method was proposed with the combination of an oversampling pretreatment of safe samples and a co-training semi-supervised method.Firstly,k-nearest neighbor method was used to identify safe samples in the training set.An oversampling operation was implemented to generate new synthetic samples among the safe samples.A new training set was generated by combining the original training set and new synthetic samples.The new training set was divided into three training subsets with Bootstrap sampling method.Decision trees as base classifiers were trained from the distribution of three training subsets,respectively.Unlabeled samples were continuously predicted and incorporated into the training subsets,which updates the performance of base classifiers.The process was terminated until the performance was stable.Three base classifiers were integrated into the final classification model for the printing registration recognition.The experimental results showed that the classification performance of the proposed method is gradually improved with the increasing number of training samples.When the number of training samples reaches 800,the proposed method achieves the best classification accuracy(Accuracy)and the geometry mean(G-mean)of recalls of samples on the test set.They are 98%and 99%,respectively,which are better than those achieved with other methods in the experiment.The proposed method can effectively exploit the distribution of unlabeled samples to improve the model performance,and realize printing registration recognition with a small number of training samples.
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38