一种新的半监督归纳迁移学习框架:Co-Transfer  被引量:2

A New Semi-Supervised Inductive Transfer Learning Framework:Co-Transfer

在线阅读下载全文

作  者:文益民 员喆[1] 余航 Wen Yimin;Yuan Zhe;Yu Hang(School of Computer Science and Information Safety,Guilin University of Electronic Technology,Guilin,Guangxi 541004;Guangxi Key Laboratory of Image and Graphic Intelligent Processing(Guilin University of Electronic Technology),Guilin,Guangxi 541004;School of Computer Engineering and Science,Shanghai University,Shanghai 200444)

机构地区:[1]桂林电子科技大学计算机与信息安全学院,广西桂林541004 [2]广西图像图形与智能处理重点实验室(桂林电子科技大学),广西桂林541004 [3]上海大学计算机工程与科学学院,上海200444

出  处:《计算机研究与发展》2023年第7期1603-1614,共12页Journal of Computer Research and Development

基  金:国家自然科学基金项目(61866007);广西自然科学基金项目(2018GXNSFDA138006);广西重点研发计划项目(桂科AB21220023);广西图像图形与智能处理重点实验室项目(GIIP2005,GIIP201505)。

摘  要:在许多实际的数据挖掘应用场景,如网络入侵检测、Twitter垃圾邮件检测、计算机辅助诊断等中,与目标域分布不同但相关的源域普遍存在.一般情况下,在源域和目标域中都有大量未标记样本,对其中的每个样本都进行标记是件困难的、昂贵的、耗时的事,有时也没必要.因此,充分挖掘源域和目标域中标记和未标记样本来解决目标域中的分类任务非常重要且有意义.结合归纳迁移学习和半监督学习,提出一种名为Co-Transfer的半监督归纳迁移学习框架.Co-Transfer首先生成3个TrAdaBoost分类器用于实现从原始源域到原始目标域的迁移学习,同时生成另外3个TrAdaBoost分类器用于实现从原始目标域到原始源域的迁移学习.这2组分类器都使用从原始源域和原始目标域的原有标记样本的有放回抽样来训练.在Co-Transfer的每一轮迭代中,每组TrAdaBoost分类器使用新的训练集更新,其中一部分训练样本是原有的标记样本,一部分是由本组TrAdaBoost分类器标记的样本,还有一部分则由另一组TrAdaBoost分类器标记.迭代终止后,把从原始源域到原始目标域的3个TrAdaBoost分类器的集成作为原始目标域分类器.在UCI数据集和文本分类数据集上的实验结果表明,Co-Transfer可以有效地学习源域和目标域的标记和未标记样本从而提升泛化性能.In many practical data mining scenarios,such as network intrusion detection,Twitter spam detection,and computer-aided diagnosis,source domain that is different but related to a target domain is very common.Generally,a large amount of unlabeled data is available in both source domain and target domain,but labeling each of them is difficult,expensive,time-consuming,and sometime unnecessary.Therefore,it is very important and worthwhile to fully explore the labeled and unlabeled data in source domain and target domain to handle classification tasks in target domain.To leverage transfer learning and semi-supervised learning,we propose a new inductive transfer learning framework named Co-Transfer.Co-Transfer first generates three TrAdaBoost classifiers for transfer learning from the original source domain to the original target domain,and meanwhile another three TrAdaBoost classifiers are generated for transfer learning from the original target domain to the original source domain by bootstrapping samples from the original labeled data.In each round of Co-Transfer,each group of TrAdaBoost classifiers is refined by using the carefully labeled data,one part of which is the original labeled samples,the second part is the samples labeled by one group of TrAdaBoost classifiers,and the other samples are labeled by another group of TrAdaBoost classifiers.Finally,the group of TrAdaBoost classifiers learned to transfer from the original source domain to the original target domain to produce the final hypothesis.Experimental results on UCI and text classification task datasets illustrate that Co-Transfer can significantly improve generalization performance by exploring labeled and unlabeled data across different tasks.

关 键 词:半监督学习 迁移学习 多任务学习 双向迁移 集成学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象