检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]南京大学计算机软件新技术国家重点实验室,南京210023
出 处:《南京大学学报(自然科学版)》2016年第4期662-671,共10页Journal of Nanjing University(Natural Science)
基 金:国家自然科学基金青年基金(61305067);中央高校基本科研业务专项基金(020214380025)
摘 要:半监督学习是机器学习近年来的热点研究方向,而协同训练(Co-training)则是半监督学习中的重要范式,它利用双视图训练两个分类器来互相标记样本以扩大训练集,以此借助未标记样本提升学习性能.在实际应用中,视图通常会受到属性退化和噪声的影响而变得不充分(即视图不能提供足够的信息来正确预测样本的标记).在不充分视图下,两个视图上的最优分类器变得不再兼容,一个视图中的分类器标记的样本可能不利于另一个视图学得最优分类器.针对这一问题,提出一种改进的协同训练算法Compatible Co-training,它记录学习过程中每个未标记样本被赋予的标记,通过比较更新后的分类器对样本预测的标记与其初始标记,动态地删除标记不一致的样本,从而除去不利于学得最优分类器的样本.实验结果显示出Compatible Co-training比协同训练具有更好的泛化能力和更快的收敛速度.Semi-supervised learning has been a popular direction of the machine learning field.It mainly focuses on utilizing unlabeled data to assist learning with labeled data.One of its major paradigms exploits the disagreement between multiple classifiers,and co-training may be the most classical representative of this paradigm.The co-training algorithm assumes a two-views setting,where it trains one classifier on each view,and let the two label new instances for each other iteratively to enlarge the training set.It has been proved that when both views are sufficient,the cotraining algorithm can find the optimal classifiers on each view.In practice however,views may be corrupted due to feature degradation or noise,such that either view cannot provide enough information to perfectly determine an instance's label.Under such situation,the two views' optimal classifiers may not be compatible any more,which means that some labels provided by one view's classifier may be misleading for the other.To mitigate the effects due to view insufficiency,we propose an improved co-training algorithm named Compatible Co-training.It tries to automatically identify and eliminate the misleadingly labeled instances.During each iteration,the algorithm records labels assigned to newly labeled instances.Then the updated classifier predicts labels for all instance labeled by the other,and dy-namically eliminate those with conflicting labels.Experiments show that in most cases Compatible Co-training generalizes better and converges faster when compared with the original co-training algorithm.Moreover,the Compatible Co-training is robust in the situation where two classifiers on each view has a large difference in initial accuracy,while co-training's performance deteriorates significantly.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145