基于非负矩阵分解的函数型聚类算法改进与比较  

Improvement and Comparison of Functional Clustering Algorithms Based on Nonnegative Matrix Factorization

在线阅读下载全文

作  者:王丙参[1] 魏艳华[1] 李旭[2] Wang Bingcan;Wei Yanhua;Li Xu(School of Mathematics and Statistics,Tianshui Normal University,Tianshui Gansu 741001,China;School of Mathematics and Computer Science,Shanxi Normal University,Taiyuan 030000,China)

机构地区:[1]天水师范学院数学与统计学院,甘肃天水741001 [2]山西师范大学数学与计算机科学学院,太原030000

出  处:《统计与决策》2024年第15期46-52,共7页Statistics & Decision

基  金:山西省自然科学基金青年项目(202203021222223);天水师范学院高层次人才科研项目(KYQ2023-13)。

摘  要:非负函数型数据可以不等间隔观测,在理论和实践中应用广泛,对其进行聚类可以更好地探索客观规律。文章利用位置积分变换将函数型数据转化为高维向量,再通过非负矩阵分解(NMF)将其转化为低维向量,以此构建函数型聚类算法。针对基于NMF的函数型谱聚类算法,给出了确定聚类个数K的两种方法:一种是根据Laplacian矩阵的特征值确定K;另一种是构建新评价指标,通过搜索确定K。数值实验结果显示:基于位置积分变换和NMF的函数型聚类算法有效,对函数结构要求宽松,但需限制函数取值为正;NMF的秩可通过cophenetic相关系数确定,建议取较小的值,以剔除类的冗余特征。在确定谱聚类的聚类个数K时,建议对降维后的数据进行标准化处理,以缩小样本间的距离变化范围;聚类个数变点图直观有效,再结合特征值差分法确定K很有参考价值,建议阈值取[0.05,0.08];根据吻合度与相似比确定K的方法有效且简单易懂。Nonnegative function data can be observed at unequal intervals,which is widely used in theory and practice,and clustering them can better explore objective laws.This paper uses location integral transformation to convert the function data into high-dimensional vector,then transforms it into low-dimensional vectors by nonnegative matrix factorization(NMF),and constructs functional clustering algorithms.For the functional spectral clustering algorithm based on NMF,two methods are offered to determine the number of clusters K:K is determined according to the eigenvalue of Laplacian matrix,and K is determined by constructing some new evaluation indexes.Numerical experiment results are shown as follows:The functional clustering algorithms based on location integral transformation and NMF are effective,and the requirement of function structure is relaxed,but the value of function should be restricted to be positive.The rank of the NMF can be determined by the cophenetic correlation coefficient,and a smaller value is recommended to eliminate the redundant features of the class.In time of determining K for spectral clustering,it is recommended to standardize the data after dimensionality reduction to reduce the range of distance variation between samples.The change point plot of the cluster number is intuitive and effective,and it is of great reference value to determine K by combining with the eigenvalue difference method,with the threshold value[0.05,0.08]recommended to take.The method of determining K based on coincidence degree and similarity ratio is effective and easy to understand.

关 键 词:函数型数据 非负矩阵分解 谱聚类 聚类个数 

分 类 号:O212[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象