一种基于最大值损失函数的快速偏标记学习算法  被引量:2

A Fast Partial Label Learning Algorithm Based on Max-loss Function

在线阅读下载全文

作  者:周瑜[1] 贺建军[1,2] 顾宏[1] 张俊星[2] 

机构地区:[1]大连理工大学电子信息与电气工程学部,辽宁大连116024 [2]大连民族大学信息与通信工程学院,辽宁大连116600

出  处:《计算机研究与发展》2016年第5期1053-1062,共10页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61503058;61374170;61502074;U1560102);高等学校博士学科点专项科研基金项目(20120041110008);中央高校基本科研业务费专项资金项目(DC201501055;DC201501060201)~~

摘  要:在弱监督信息条件下进行学习已成为大数据时代机器学习领域的研究热点,偏标记学习是最近提出的一种重要的弱监督学习框架,主要解决在只知道训练样本的真实标记属于某个候选标记集合的情况下如何进行学习的问题,在很多领域都具有广泛应用.最大值损失函数可以很好地描述偏标记学习中的样本与候选标记间的关系,但是由于建立的模型通常是一个难以求解的非光滑函数,目前还没有建立基于该损失函数的偏标记学习算法.此外,已有的偏标记学习算法都只能处理样本规模比较小的问题,还没看到面向大数据的算法.针对以上2个问题,先利用凝聚函数逼近最大值损失函数中的max(·)将模型的目标函数转换为一个光滑的凹函数,然后利用随机拟牛顿法对其进行求解,最终实现了一种基于最大值损失函数的快速偏标记学习算法.仿真实验结果表明,此算法不仅要比基于均值损失函数的传统算法取得更好的分类精度,运行速度上也远远快于这些算法,处理样本规模达到百万级的问题只需要几分钟.In the age of big data ,learning with weak supervision has become one of the hot research topics in machine learning field . Partial label learning , which deals with the problem where each training example is associated with a set of candidate labels among which only one label corresponds to the ground-truth ,is an important weakly-supervised machine learning frameworks proposed recently and can be widely used in many real world tasks .The max-loss function may be used to accurately capture the relationship between the partial labeled sample and its labels .However ,since the max-loss function usually brings us a nondifferentiable objective function difficult to be solved ,it is rarely adopted in the existing algorithms .Moreover ,the existing partial label learning algorithms can only deal with the problem with small-scale data ,and rarely can be used to deal with big data .To cure above two problems , this paper presents a fast partial label learning algorithm with the max-loss function .The basic idea is to transform the nondifferentiable objective to a differentiable concave function by introducing the aggregate function to approximate the max (?) function involved in the max-lass function ,and then to solve the obtained concave objective function by using a stochastic quasi-New ton method . The experimental results show that the proposed algorithm can not only achieve higher accuracy but also use shorter computing time than the state-of-the-art algorithms with average-loss functions .Moreover ,the proposed algorithm can deal with the problems with millions samples within several minutes .

关 键 词:偏标记学习 最大值损失函数 凝聚函数 弱监督学习 分类精度 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象