机构地区:[1]大连理工大学软件学院,辽宁大连116620 [2]大连理工大学辽宁省泛在网络与服务软件重点实验室,辽宁大连116620
出 处:《计算机学报》2019年第12期2614-2630,共17页Chinese Journal of Computers
基 金:国家自然科学基金(61632019)资助~~
摘 要:传统聚类方法只对每个数据集单独进行聚类,但是有时单个数据集中的数据不足以挖掘一个良好的簇结构.在现实生活中,有很多数据集包含相同的类标签,因此存在多个相关的聚类任务.多任务聚类通过在相关任务之间迁移知识来提升每个任务的聚类性能,近些年来它获得越来越多的关注.一个好的多任务聚类算法要完成以下两方面工作:(1)它应该充分利用来自其它任务的知识;(2)它能够自动地评估任务相关性以避免负面迁移.然而,现有多任务聚类方法还不能很好地完成任意一方面的工作.本文提出一个基于特征和实例迁移的加权多任务聚类算法MTCFIR.一方面,它在任务之间既迁移特征表示知识又迁移实例知识,要比大部分现有多任务聚类方法更充分地利用跨任务知识.另一方面,它自动地学习任务相关性来避免负面迁移,并且没有现有评估任务相关性的多任务聚类方法的限制条件.MTCFIR执行以下三个步骤.首先,它利用边缘堆栈降噪自编码器在任务之间学习一个共有的特征表示.该步骤通过迁移特征表示知识来降低任务之间的分布差异,这是一致相似度矩阵学习的前提.其次,它通过在任务之间迁移实例知识来为每个任务学习一个一致相似度矩阵,并且通过对任务进行加权来决定不同任务对每个任务的一致相似度矩阵学习的贡献程度.该步骤可以避免在不太相关的任务之间强制迁移知识所带来的负面影响.最后,它在每个任务的一致相似度矩阵上执行对称非负矩阵分解来得到聚类结果.在真实数据集上的实验结果说明本文提出的方法比传统单任务聚类方法和现有多任务聚类方法具有更好的聚类效果,并且要比大部分多任务聚类方法高效.Traditional clustering methods cluster the samples in each data set individually by only using the knowledge within each data set,but sometimes the data samples in a single data set are not enough to discover a good cluster structure.There are many data sets which contain the same class labels in the real world,hence there exist many related clustering tasks.Multi-task clustering can transfer the relevant knowledge across the related tasks to improve the clustering performance of each task,which has received more and more attentions in recent years.A good multi-task clustering algorithm should accomplish both the following two aspects of work:(1)it should make full use of the useful knowledge from the other related tasks;(2)it can automatically assess the task relatedness among the tasks to avoid negative transfer.Nevertheless,existing multi-task clustering methods have not accomplished either of them well.This paper proposes a weighted multi-task clustering method based on feature and instance transfer,which is called as MTCFIR.On one hand,MTCFIR transfers both the knowledge of feature representation and instances across the related tasks,thus making better use of the cross-task knowledge than most existing multi-task clustering methods.On the other hand,MTCFIR automatically learns the task relatedness to avoid negative transfer,and it does not have the limitations of the existing multi-task clustering methods which can assess the task relatedness,e.g.,all the tasks should have the same cluster numbers,and the label marginal distribution in each task distributes evenly.There are three steps in the MTCFIR method.First,it learns a common feature representation among the related tasks with marginalized stacked denoising autoencoders(SDA).SDA can abstract a set of high-level features that indicate the generic concepts,learning these features for multi-task data is beneficial to extract commonality from the original features in different tasks.This step can reduce the distribution difference among the tasks by transf
关 键 词:多任务聚类 特征表示迁移 实例迁移 任务相关性学习 一致相似度矩阵学习
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...