一种基于自训练的众包标记噪声纠正算法  

A Self-training-based Label Noise Correction Algorithm for Crowdsourcing

在线阅读下载全文

作  者:杨艺 蒋良孝 李超群[3] YANG Yi;JIANG Liang-Xiao;LI Chao-Qun(School of Computer Science,China University of Geosciences(Wuhan),Wuhan 430074;Hubei Key Laboratory of Intelligent Geo-Information Processing,China University of Geosciences(Wuhan),Wuhan 430074;School of Mathematics and Physics,China University of Geosciences(Wuhan),Wuhan 430074)

机构地区:[1]中国地质大学(武汉)计算机学院,武汉430074 [2]智能地学信息处理湖北省重点实验室(中国地质大学(武汉)),武汉430074 [3]中国地质大学(武汉)数学与物理学院,武汉430074

出  处:《自动化学报》2023年第4期830-844,共15页Acta Automatica Sinica

基  金:国家自然科学基金联合基金(U1711267);中央高校基本科研业务费专项资金(CUGGC03)资助。

摘  要:针对众包标记经过标记集成后仍然存在噪声的问题,提出了一种基于自训练的众包标记噪声纠正算法(Selftraining-based label noise correction,STLNC).STLNC整体分为3个阶段:第1阶段利用过滤器将带集成标记的众包数据集分为噪声集和干净集.第2阶段利用加权密度峰值聚类算法构建数据集中低密度实例指向高密度实例的空间结构关系.第3阶段首先根据发现的空间结构关系设计噪声实例选择策略;然后利用在干净集上训练的集成分类器对选择的噪声实例按照设计的实例纠正策略进行纠正,并将纠正后的实例加入到干净集,再重新训练集成分类器;重复实例选择与纠正过程直到噪声集中所有的实例被纠正;最后用最后一轮训练得到的集成分类器对所有实例进行纠正.在仿真标准数据集和真实众包数据集上的实验结果表明STLNC比其他5种最先进的噪声纠正算法在噪声比和模型质量两个度量指标上表现更优.In order to solve the problem that a certain level of label noise exists in integrated labels obtained by label integration algorithms,this paper proposes a self-training-based label noise correction(STLNC)algorithm for crowdsourcing.There are three stages in STLNC.At the first stage,STLNC employs a filter to get a clean set and a noisy set.At the second stage,the weighted density peak clustering algorithm is used to construct the spatial structure relationship between low-density instances and high-density instances in the dataset.At the third stage,a noise instance selection strategy is at first designed according to the found spatial structure relationship.Then,these selected noise instances are corrected by the ensemble classifier trained on the clean set according to the designed instance correction strategy,and the corrected instances are added into the clean set and the ensemble classifier is retrained.The process of instance selection and correction is repeated until all noise instances are corrected.Finally,the ensemble classifier trained from the last round is used to correct all the instances.Experimental results on both simulated benchmark datasets and real-world crowdsourced datasets show that STLNC significantly outperforms other five state-of-the-art noise correction algorithms in team of the noise ratio and the model quality.

关 键 词:众包学习 自训练 集成标记 标记噪声 噪声纠正 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP181[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象