半监督AUC优化的Boosting算法及理论  被引量:6

Boosting-Based Semi-Supervised AUC Optimization:Theory and Algorithm

在线阅读下载全文

作  者:杨智勇 许倩倩[2] 何源 操晓春[4] 黄庆明 YANG Zhi-Yong;XU Qian-Qian;HE Yuan;CAO Xiao-Chun;HUANG Qing-Ming(School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing101408;Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;Alibaba Turing Security Lab,Beijing 100102;State Key Laboratory of Information Security,Institute of Information Engineering,Chinese Academy of Sciences,Beijing100093;Key Laboratory of Big Data Mining and Knowledge Management(BDKM),University of Chinese Academy of Sciences,Beijing 101408;Peng Cheng Laboratory,Shenzhen,Guangdong 518055)

机构地区:[1]中国科学院大学计算机科学与技术学院,北京101408 [2]中国科学院计算技术研究所智能信息处理重点实验室,北京100190 [3]阿里安全图灵实验室,北京100102 [4]中国科学院信息工程研究所信息安全国家重点实验室,北京100093 [5]中国科学院大数据挖掘与知识管理重点实验室,中国科学院大学,北京101408 [6]鹏城实验室,广东深圳518055

出  处:《计算机学报》2022年第8期1598-1617,共20页Chinese Journal of Computers

基  金:科技创新2030-“新一代人工智能”重大项目(2018AAA0102003)、国家自然科学基金项目(61620106009,61931008,61836002,U2001202,61976202);中央高校基本科研业务费专项资金资助、中国科学院战略性先导科技专项(XDB28000000);博士后创新人才支持计划(BX2021298);中国科学院青年创新促进会、阿里巴巴集团ARF项目资助.

摘  要:ROC曲线下面积(Area Under the ROC Curve,AUC)是类不均衡/二分排序等问题中的标准评价指标之一.本文主要聚焦于半监督AUC优化方法.现有大多数方法局限于通过单一模型进行半监督AUC优化,对如何通过模型集成技术融合多个模型则鲜有涉及.考虑上述局限性,本文主要研究基于模型集成的半监督AUC优化方法.具体而言,本文提出一种基于Boosting算法的半监督AUC优化算法,并提出基于权重解耦的加速策略以降低算法时间/空间复杂度.进一步地,在优化层面,本文通过理论分析证明了所提出的算法相对于弱分类器的增加具有指数收敛速率;在模型泛化能力层面,本文构造了所提出算法的泛化误差上界,并证明增加弱分类器个数在提升训练集性能的同时并不会带来明显的过拟合风险.最后,本文在16个基准数据集上对所提出算法的性能进行了验证,实验结果表明所提出算法在多数情况下以0.05显著水平优于其他对比方法,并可在平均意义上产生0.9%~11.28%的性能提升.Area Under the ROC Curve(AUC)is a standard evaluation metric for a wide range of tasks such as class-imbalance classification and bipartite ranking.This paper focuses on the semi-supervised AUC optimization problem.Most existing methods only adopt single-model-based methods,while rarely taking into account the benefit of combine multiple models.To address this issue,this paper studies the problem of how to effectively ensemble a series of semi-supervised AUC optimization methods.Specifically,we propose a boosting-based semi-supervised AUC optimization method.On top of this,we provide an acceleration strategy based on a weight decoupling strategy to reduce the time and space complexity.Moreover,we theoretically prove that the proposed algorithm has an exponential convergence rate with respect to the number of weak learners.Meanwhile,we provide a generalization error bound of the proposed method,and further prove that increasing the number of weak learners could improve the performance on the training set without the cost of a significant overfitting effect.Finally,we evaluate our proposed framework on 16 benchmark datasets.Experimental results show that the proposed algorithm outperforms all the competitors with a significance level of 0.05,and achieves a 0.9%-11.28%performance gain in average.

关 键 词:AUC优化 集成学习 半监督学习 提升法 Rademacher复杂度 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象