基于k-center聚类和最近邻中心的公平数据汇总

Fair Data SummarizationBased on k -Center Clustering and Nearest Neighbor Center

作　　者：何艳[1] 黄巧玲郑伯川 HE Yan;HUANG Qiao-ling;ZHENG Bo-chuan(School of Mathematics&Information,China West Normal University,Nanchong Sichuan 637009,China;School of Computer Science,China West Normal University,Nanchong Sichuan 637009,China)

机构地区：[1]西华师范大学数学与信息学院,四川南充637009 [2]西华师范大学计算机学院,四川南充637009

出　　处：《西华师范大学学报（自然科学版）》2025年第1期95-103,共9页Journal of China West Normal University(Natural Sciences)

基　　金：国家自然科学基金面上项目(62176217);西华师范大学科研创新团队项目(KCXTD2022-3);西华师范大学创新创业项目(cxcy2023050)。

摘　　要：公平数据汇总是指从每种数据类别中选择有代表性的子集,且满足公平性要求。在大数据时代,每种类别的数据都是海量的,因此公平数据汇总研究具有非常重要的现实意义。为了使公平数据汇总的数据点更具有代表性,提出了基于k-center聚类和最近邻中心的公平数据汇总算法。算法主要包括2个基本步骤:(1)通过k-center聚类,将k个簇中心作为当前汇总结果;(2)选择满足公平约束的原簇中心的最近邻点作为新簇中心。由于更新簇中心时选择的是原簇中心的最近邻点,因此相对随机选择的数据点,最近邻点更具有代表性,是除原始簇中心外的次优代表点。同时,寻找最近邻点作为新的簇中心能最大限度减少公平代价。在2个模拟数据集和6个UCI真实数据集上的对比实验结果表明,所提出的算法在近似因子和公平代价方面都优于对比算法,说明所提出的算法获得的数据汇总更具有代表性。The fair data summarization refers to selecting representative subset from each data category and satisfying the fairness requirement.In the era of big data,each category may contain a large volume of data,so the research into fair data summarization is of great practical importance.To enhance the representativeness of data points in data summarization,we proposed a fair data summarization algorithm based on k-center clustering and nearest neighbor center.The algorithm mainly consists of two basic steps:(1)K centers are taken as the current summarization result via k-center clustering;(2)The nearest neighbors of the original cluster centers that satisfy the fairness constraints are selected as the new cluster centers.Because nearest neighbors are selected as new cluster centers,they are more representative compared to data points selected randomly,and they are also suboptimal representative points besides the original cluster centers.Moreover,selecting nearest neighbor points as new cluster centers can minimize the fairness cost.The comparison results on 2 simulated datasets and 6 real UCI datasets show that the proposed algorithm outperforms the compared algorithm in terms of approximation factors and fair cost,indicating that the data summarization obtained by the proposed algorithm is more representative.

关键词：最近邻点 k-center聚类数据汇总公平约束

分类号：TP311.1[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于k-center聚类和最近邻中心的公平数据汇总

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于k-center聚类和最近邻中心的公平数据汇总

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索