高维数据的增量式聚类算法的距离度量选择研究被引量：6

Selecting distance metrics for incremental clustering algorithm of high dimensional data

作　　者：邵俊健王士同[1] SHAO Jun-jian;WANG Shi-tong(School of Digital Media,Jiangnan University,Wuxi 214122,China)

出　　处：《计算机工程与科学》2019年第2期214-223,共10页Computer Engineering & Science

摘　　要：合适的距离度量函数对于聚类结果有重要的影响。针对大规模高维数据集,使用增量式聚类算法进行距离度量的选择分析。SpFCM算法是将大规模数据集分成小样本进行增量分批聚类,可在有限的计算机内存中获得较好的聚类结果。在传统的SpFCM算法的基础上,使用不同的距离度量函数来衡量样本之间的相似性,以得出不同的距离度量对SpFCM算法的影响。在不同的大规模高维数据集中,使用欧氏距离、余弦距离、相关系数距离和扩展的杰卡德距离来计算距离。实验结果表明,后3个距离度量相对于欧氏距离可以很大程度地提高聚类效果,其中相关系数距离可以得到较好的结果,余弦距离和扩展的杰卡德距离效果比较一般。Appropriate distance metric functions have an important effect on clustering results.For large-scale and high-dimensional datasets,the incremental fuzzy clustering algorithm is used to analyze the selection of distance metrics.Since the SpFCM algorithm divides a large-scale dataset into small samples for incremental batch clustering,it can get better clustering results in limited computer memory.Different distance metric functions are applied into the traditional SpFCM algorithm in order to measure the similarities between different samples to check the effect of different distance metrics on the SpFCM algorithm.Four distance metrics,which are the Euclidean metric,the cosine metric,the correlation distance metric and the extended Jaccard similarity metric,are used to calculate the distance for different large-scale high dimensional datasets.Experimental results show that,the latter three distance metrics can greatly improve the clustering effect.The correlation distance metric gets a better clustering result while the cosine distance metric and the extended Jaccard similarity distance get an average result.

关键词：高维数据 SpFCM算法距离度量增量式模糊聚类算法相关系数距离度量

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高维数据的增量式聚类算法的距离度量选择研究被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高维数据的增量式聚类算法的距离度量选择研究 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

高维数据的增量式聚类算法的距离度量选择研究被引量：6