检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张永[1] 陈蓉蓉 张晶 Zhang Yong;Chen Rongrong;Zhang Jing(School of Computer&Information Technology,Liaoning Normal University,Dalian,Liaoning 116081)
机构地区:[1]辽宁师范大学计算机与信息技术学院,辽宁大连116081
出 处:《计算机研究与发展》2021年第1期60-69,共10页Journal of Computer Research and Development
基 金:国家自然科学基金项目(61772252,61902165);辽宁省高等学校创新人才支持计划项目(LR2017044);辽宁省自然科学基金项目(2019-MS-216)。
摘 要:半监督学习方法通过少量标记数据和大量未标记数据来提升学习性能.Tri-training是一种经典的基于分歧的半监督学习方法,但在学习过程中可能产生标记噪声问题.为了减少Tri-training中的标记噪声对未标记数据的预测偏差,学习到更好的半监督分类模型,用交叉熵代替错误率以更好地反映模型预估结果和真实分布之间的差距,并结合凸优化方法来达到降低标记噪声的目的,保证模型效果.在此基础上,分别提出了一种基于交叉熵的Tri-training算法、一个安全的Tri-training算法,以及一种基于交叉熵的安全Tri-training算法.在UCI(University of California Irvine)机器学习库等基准数据集上验证了所提方法的有效性,并利用显著性检验从统计学的角度进一步验证了方法的性能.实验结果表明,提出的半监督学习方法在分类性能方面优于传统的Tri-training算法,其中基于交叉熵的安全Tri-training算法拥有更高的分类性能和泛化能力.Semi-supervised learning methods improve learning performance with a small amount of labeled data and a large amount of unlabeled data.Tri-training algorithm is a classic semi-supervised learning method based on divergence,which does not need redundant views of datasets and has no specific requirements for basic classifiers.Therefore,it has become the most commonly used technology in semi-supervised learning methods based on divergence.However,Tri-training algorithm may produce the problem of label noise in the learning process,which leads to a bad impact on the final model.In order to reduce the prediction bias of the noise in Tri-training algorithm on the unlabeled data and learn a better semi-supervised classification model,cross entropy is used to replace the error rate to better reflect the gap between the predicted results and the real distribution of the model,and the convex optimization method is combined to reduce the label noise and ensure the effect of the model.On this basis,we propose a Tri-training algorithm based on cross entropy,a safe Tri-training algorithm and a safe Tri-training learning algorithm based on cross entropy,respectively.The validity of the proposed method is verified on the benchmark dataset such as UCI(University of California Irvine)machine learning repository and the performance of the method is further verified from a statistical point of view using a significance test.The experimental results show that the proposed semi-supervised learning method is superior to the traditional Tri-training algorithm in classification performance,and the safe Tri-training algorithm based on cross entropy has higher classification performance and generalization ability.
关 键 词:半监督学习 Tri-training算法 交叉熵 凸优化 样本标记
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.42