一种改进的协同训练算法:Compatible Co-training  被引量:11

An improved co-training style algorithm:Compatible Co-training

在线阅读下载全文

作  者:郭翔宇[1] 王魏[1] 

机构地区:[1]南京大学计算机软件新技术国家重点实验室,南京210023

出  处:《南京大学学报(自然科学版)》2016年第4期662-671,共10页Journal of Nanjing University(Natural Science)

基  金:国家自然科学基金青年基金(61305067);中央高校基本科研业务专项基金(020214380025)

摘  要:半监督学习是机器学习近年来的热点研究方向,而协同训练(Co-training)则是半监督学习中的重要范式,它利用双视图训练两个分类器来互相标记样本以扩大训练集,以此借助未标记样本提升学习性能.在实际应用中,视图通常会受到属性退化和噪声的影响而变得不充分(即视图不能提供足够的信息来正确预测样本的标记).在不充分视图下,两个视图上的最优分类器变得不再兼容,一个视图中的分类器标记的样本可能不利于另一个视图学得最优分类器.针对这一问题,提出一种改进的协同训练算法Compatible Co-training,它记录学习过程中每个未标记样本被赋予的标记,通过比较更新后的分类器对样本预测的标记与其初始标记,动态地删除标记不一致的样本,从而除去不利于学得最优分类器的样本.实验结果显示出Compatible Co-training比协同训练具有更好的泛化能力和更快的收敛速度.Semi-supervised learning has been a popular direction of the machine learning field.It mainly focuses on utilizing unlabeled data to assist learning with labeled data.One of its major paradigms exploits the disagreement between multiple classifiers,and co-training may be the most classical representative of this paradigm.The co-training algorithm assumes a two-views setting,where it trains one classifier on each view,and let the two label new instances for each other iteratively to enlarge the training set.It has been proved that when both views are sufficient,the cotraining algorithm can find the optimal classifiers on each view.In practice however,views may be corrupted due to feature degradation or noise,such that either view cannot provide enough information to perfectly determine an instance's label.Under such situation,the two views' optimal classifiers may not be compatible any more,which means that some labels provided by one view's classifier may be misleading for the other.To mitigate the effects due to view insufficiency,we propose an improved co-training algorithm named Compatible Co-training.It tries to automatically identify and eliminate the misleadingly labeled instances.During each iteration,the algorithm records labels assigned to newly labeled instances.Then the updated classifier predicts labels for all instance labeled by the other,and dy-namically eliminate those with conflicting labels.Experiments show that in most cases Compatible Co-training generalizes better and converges faster when compared with the original co-training algorithm.Moreover,the Compatible Co-training is robust in the situation where two classifiers on each view has a large difference in initial accuracy,while co-training's performance deteriorates significantly.

关 键 词:半监督学习 协同训练 不充分视图 不一致标记 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] TP301.6[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象