基于密度聚类和邻域的主动学习方法  被引量:3

An Active Learning Methods Based on Density Clustering and Neighborhood

在线阅读下载全文

作  者:刘志秀 胡峰[1,2] 邓维斌[2] 于洪[1,2] LIU Zhixiu;HU Feng;DENG Weibin;YU Hong(School of Computer and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications),Chongqing 400065,China)

机构地区:[1]重庆邮电大学计算机科学与技术学院,重庆400065 [2]重庆邮电大学计算智能重庆市重点实验室,重庆400065

出  处:《山西大学学报(自然科学版)》2020年第4期850-857,共8页Journal of Shanxi University(Natural Science Edition)

基  金:国家重点研发计划资助项目(2018YFC0832100,2018YFC0832102);国家自然科学基金(61876027,61876201)。

摘  要:主动学习是机器学习的子领域,可以有选择地对样本进行学习,主要解决无法有效使用大量无标签数据的问题。结合密度聚类算法和邻域模型,提出了一种主动学习方法,这是一个交替地执行聚类算法和选择样本进行标注的过程。首先,利用密度峰值聚类算法(DCFSFDP)对数据集进行类簇划分;其次,根据样本邻域信息制定的选择策略选择部分样本进行标记后加入有标签样本集合,并利用已标记的样本在下一次聚类过程中修正聚类结果,使类簇划分更加准确;最后,当有标记样本的数目达到指定的上限后停止聚类过程。实验结果表明,提出的方法能在只有少量有标签样本的情况下主动对大量样本进行标记,证明该算法能有效地应用于处理大量无标签数据。Active learning is a sub-field of machine learning,which can selectively learn from samples and it is mainly used to solve the problem that a large amount of unlabeled data cannot be used effectively.Combining density clustering algorithm and neighborhood model,this paper proposes an active learning method that is mainly a process of alternately performing clustering algorithm and selecting samples for labeling.First,the density clustering by fast search and find of density peaks algorithm(DCFSFDP)is used to divide the data set into clusters.Second,according to the sample selection strategy formulated by the sample neighborhood information,some samples are selected for labeling and added to the set of labeled samples,and these labeled samples are used to correct the clustering results in the next clustering process,so as to make the clustering results more accurate.Finally,the clustering process is stopped when the number of labeled samples reaches the specified upper limit.The experiment results show that the method can actively label a large number of samples when there are only a few labeled samples,which proves that this algorithm can be effectively used to process a large number of unlabeled data.

关 键 词:主动学习 密度峰值聚类算法 邻域模型 采样策略 无标签样本 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象