检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《应用数学进展》2025年第1期318-330,共13页Advances in Applied Mathematics
基 金:河北省中央引导地方科技专项项目(246Z1825G);河北省高等学校科学研究项目(QN2020188, ZD2020185);河北省“三三三人才工程”资助项目(C20221026)。
摘 要:分类问题是数据挖掘、机器学习等领域的基础性问题之一,然而多数分类方法仅关注向量值样本的分类问题,而对于实际中广泛存在的集值型数据样本的分类关注较少。本文提出了一种基于Wasserstein距离的无监督聚类算法(Wk-means),利用熵正则最优传输模型度量集值型数据点之间的距离,并结合聚类的思想设计了一个可用于集值型数据的Wk-means聚类方法。为验证方法的有效性,本文首先在几个公开数据集上进行了实验,结果证实了Wk-means在多样本、多类别、多特征的集值型数据中表现优异,并且通过统计检验表明本文算法与其他算法存在显著差异。随后将本文方法实际应用于滏阳河水质数据集,结果同样表明相比传统的数据聚类算法,Wk-means能够更准确地划分水质类别,且运行效率更高。本文提出的Wk-means算法在集值型水质数据的分类任务中表现出色,能够为环境监测和管理提供有价值的决策支持。Classification is one of the basic problems in data mining, machine learning and other fields. However, most classification methods only focus on the vector-valued samples, while paying less attention to the classification of set-valued data samples that are widely existed in practice. This paper proposes an unsupervised clustering algorithm (Wk-means) based on Wasserstein distance. Combined with the idea of clustering, Wk-means can be used for set-valued samples, in which the entropy-regularized optimal transport model is used to measure the distance between set-valued samples. In order to verify the effectiveness of Wk-means, experiments are conducted firstly on several public data sets. The results confirm the excellent performance of Wk-means in set-valued data with multi-sample, multi-category, and multi-feature. Moreover, the statistical test show that Wk-means is significantly different from other algorithms. Wk-means is then applied to the Fuyang River water quality data set. The res
关 键 词:集值型数据 分类问题 Wasserstein距离 最优传输 水质分类
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7