检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:何艳[1] 黄巧玲 郑伯川 HE Yan;HUANG Qiao-ling;ZHENG Bo-chuan(School of Mathematics&Information,China West Normal University,Nanchong Sichuan 637009,China;School of Computer Science,China West Normal University,Nanchong Sichuan 637009,China)
机构地区:[1]西华师范大学数学与信息学院,四川南充637009 [2]西华师范大学计算机学院,四川南充637009
出 处:《西华师范大学学报(自然科学版)》2025年第1期95-103,共9页Journal of China West Normal University(Natural Sciences)
基 金:国家自然科学基金面上项目(62176217);西华师范大学科研创新团队项目(KCXTD2022-3);西华师范大学创新创业项目(cxcy2023050)。
摘 要:公平数据汇总是指从每种数据类别中选择有代表性的子集,且满足公平性要求。在大数据时代,每种类别的数据都是海量的,因此公平数据汇总研究具有非常重要的现实意义。为了使公平数据汇总的数据点更具有代表性,提出了基于k-center聚类和最近邻中心的公平数据汇总算法。算法主要包括2个基本步骤:(1)通过k-center聚类,将k个簇中心作为当前汇总结果;(2)选择满足公平约束的原簇中心的最近邻点作为新簇中心。由于更新簇中心时选择的是原簇中心的最近邻点,因此相对随机选择的数据点,最近邻点更具有代表性,是除原始簇中心外的次优代表点。同时,寻找最近邻点作为新的簇中心能最大限度减少公平代价。在2个模拟数据集和6个UCI真实数据集上的对比实验结果表明,所提出的算法在近似因子和公平代价方面都优于对比算法,说明所提出的算法获得的数据汇总更具有代表性。The fair data summarization refers to selecting representative subset from each data category and satisfying the fairness requirement.In the era of big data,each category may contain a large volume of data,so the research into fair data summarization is of great practical importance.To enhance the representativeness of data points in data summarization,we proposed a fair data summarization algorithm based on k-center clustering and nearest neighbor center.The algorithm mainly consists of two basic steps:(1)K centers are taken as the current summarization result via k-center clustering;(2)The nearest neighbors of the original cluster centers that satisfy the fairness constraints are selected as the new cluster centers.Because nearest neighbors are selected as new cluster centers,they are more representative compared to data points selected randomly,and they are also suboptimal representative points besides the original cluster centers.Moreover,selecting nearest neighbor points as new cluster centers can minimize the fairness cost.The comparison results on 2 simulated datasets and 6 real UCI datasets show that the proposed algorithm outperforms the compared algorithm in terms of approximation factors and fair cost,indicating that the data summarization obtained by the proposed algorithm is more representative.
关 键 词:最近邻点 k-center聚类 数据汇总 公平约束
分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.118.193.52