基于用户兴趣域的混合数据聚类标签算法  被引量:1

Mixed Data Clustering Label Algorithm Based on User's Interest Domain

在线阅读下载全文

作  者:李德玉[1,2] 翁小奎[1] 李艳红[1,2] 

机构地区:[1]山西大学计算机与信息技术学院,山西太原030006 [2]山西大学计算智能与中文信息处理教育部重点实验室,山西太原030006

出  处:《山西大学学报(自然科学版)》2013年第2期180-186,共7页Journal of Shanxi University(Natural Science Edition)

基  金:国家自然科学基金(61175067;61272095);山西省自然科学基金(2010011021-1);山西省科技攻关项目(20110321027-02)

摘  要:数据聚类标签技术是在小规模样本上进行聚类,然后利用聚类结果对其余样本标注类别的方法是提高大规模数据聚类效率的一种有效途径.混合数据是现实应用中最广泛的数据类型,文章将用户兴趣数据作为小规模数据,利用K-prototypes算法对其聚类,在此基础上构建用户兴趣域.利用拟标签数据的各属性值与用户兴趣域分量的关系定义了数据关于用户兴趣域隶属度.基于用户兴趣域和"数据-用户兴趣域"隶属度的概念,提出了一种基于用户兴趣混合数据聚类标签算法UIMCL(User’s Interest Mixed Data Clustering Label).该算法克服了以往数据标签算法只能为拟标记数据指派一个类标签的局限性,可以应用于电子商务的推荐服务和用户行为分析.实验结果表明,该算法对混合数据聚类标签处理有较好的效果.Data clustering label technology is a method that performs clustering on a small-scale sample set and then labels the rest of samples by using the clustering results. It is an effective way to improve the effi- ciency of large-scale data clustering. The mixed data are the most widely used data type in real-world appli- cations. This paper regards user's interest data as a small-scale data and it is clustered by the K-prototypes clustering algorithm. The clustering result is used to construct the user's interest domains. The member- ship degree of a sample to the user's interest domain is defined by the relationship of the attribute values of the unlabeled sample and the components of the users~ interest domain. A mixed data clustering label algo- rithm is proposed based on the concepts of users' interest domain and "data-user's interest domain" mem- bership degree. This algorithm can overcome the limitation that unlabeled data is only assigned a class label by the existing data label algorithms. It can he applied to recommendation service and user behavior analy- sis in electronic commerce. The experiments show that the algorithm has better results on the mixed data clustering label processing.

关 键 词:混合数据 聚类 用户兴趣域 UIMCL算法 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象