检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:常瑜[1,2] 梁吉业[1,2] 高嘉伟[1,2] 杨静[1,2]
机构地区:[1]山西大学计算机与信息技术学院,太原030006 [2]计算智能与中文信息处理教育部重点实验室,太原030006
出 处:《南京大学学报(自然科学版)》2012年第4期405-411,共7页Journal of Nanjing University(Natural Science)
基 金:国家自然科学基金(71031006;70971080);国家"973"计划前期研究专项课题(2011CB311805);高等学校博士学科点专项科研基金(20101401110002)
摘 要:半监督聚类研究如何利用少量的监督信息来提高聚类性能,目前已经成为机器学习领域的一个研究热点.现有的大多数半监督聚类方法没有综合考虑Seeds集和成对约束这两种监督信息,因而提出了一种基于Seeds集和成对约束的半监督聚类算法.该算法运用Tri-training算法扩充Seeds集,结合成对约束优化Seeds集并指导聚类过程.实验结果表明,该算法能够有效提高聚类性能.Abstract:Semi-supervised learning, a kind of application-driven machine learning method, has become one of the hot topics of artificial intelligence and pattern recognition. As the main branch of semi-supervised learning, semi- supervised clustering gives a small amount of supervision information into the search process of optimal clustering. Recently, kinds of semi-supervised clustering algorithms are proposed, such as methods based on search, methods based on similarity, methods based on search and similarity. However, most current semi-supervised clustering algorithms don't use valuable seeds and pair-wise constraints at the same time. Therefore, a semi-supervised clustering algorithm based on seeds and pair-wise constraints is introduced, in order to make full use of given supervision information. In addition, Tri-training algorithm is a representative method based on Co-training mechanism. Considering that Tri-training algorithm can use three classifiers to label unlabeled samples, the proposed algorithm will utilize it to get more labeled samples. Firstly, based on Tri-training method, some unlabeledsamples are selected and annotated, to enlarge the number of initial labeled samples. Secondly, pair wise constraints are utilized to optimize enlarged labeled samples, with the purpose of improving its quality. Thirdly, initial clustering centers are acquired by optimized labeled samples. Finally, K-Means algorithm is carried out, and in the search process, pair-wise constraints are used to modify the partitioning results each time. Furthermore the proposed algorithm is compared with K-Means, Seeded-K-Means and COP-K-Means algorithm. And experimental results on three UCI data sets in same setting demonstrate that this method can take full advantage o{ given supervision information and get a better clustering result. Moreover, the experiment in Haberman data set is conducted to analyze relative impact on the algorithm's performance of pair-wise constraints and labeled samples numbers. Experimental results
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30