一种基于少量标签的改进迁移模糊聚类  被引量:2

An improved transfer fuzzy clustering with few labels

在线阅读下载全文

作  者:王跃[1] 杨燕[1] 王红军[1] 

机构地区:[1]西南交通大学信息科学与技术学院,四川成都610031

出  处:《智能系统学报》2016年第3期310-317,共8页CAAI Transactions on Intelligent Systems

基  金:国家自然科学基金项目(61170111;61572407;61134002);四川省科技支撑计划项目(2014SZ0207)

摘  要:传统聚类算法难以利用已有的历史信息,尤其是数据被污染的情况下聚类结果不理想;半监督聚类常用于数据中有部分标签的情况。在源数据有少量标签的情况下,提出半监督混合C均值聚类算法(SS-FPCM);基于迁移学习框架,针对负迁移问题对算法进行修正,提出了防止负迁移的半监督迁移算法(TSS-FPCM);最后,为了充分借鉴源数据的信息,利用"代表点"来代替源数据类信息,融入算法中再次迁移得到改善的半监督迁移算法(ITSSFPCM)。实验表明,3个算法能够有效的利用源数据提高聚类性能。SS-FPCM与TSS-FPCM可以利用源数据的少量标签数据,而ITSS-FPCM算法结合了标签数据与"代表点"两个有效信息,在数据信息匮乏、数据被污染的情况下得到较好的聚类结果。In the traditional clustering algorithm, it is difficult to utilize existing historical information, which tends to be less effective in cases in which the data is contaminated. The semi-supervised clustering algorithm is often used in such circumstances, wherein the target data has some labeled examples. For situations in which the source data has partially labeled samples, in this paper, we propose a semi-supervised fuzzy possibilistic C-means algo-rithm ( SS-FPCM) . Based on the transfer learning framework, we use a transfer semi-supervised fuzzy possibilistic C-means algorithm ( TSS-FPCM) to avoid the negative transfer learning problem. Finally, in order to make full use of source data information, we use representative points to replace the source data class. Thus, we have developed an improved transfer semi-supervised fuzzy possibilistic C-means algorithm ( ITSS-FPCM) . The experimental results demonstrate that these three algorithms may be used to improve the clustering performance by using source data ef-fectively, as compared with other clustering algorithms. Moreover, the SS-FPCM and TSS-FPCM algorithms exploit partially labeled data from the source, while the ITSS-FPCM algorithm combines the labeled data and"representative points," for cases having insufficient data information or contaminated data, and an excellent clustering result is attained.

关 键 词:聚类 迁移学习 半监督 可能性C均值 模糊C均值 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象