非线性主成分分析中数据非线性特征的检验方法  被引量:2

A New Method to Test the Nonlinear Feature in Nonlinear Principal Component Analysis

在线阅读下载全文

作  者:高青松[1] 薛付忠[1] 

机构地区:[1]山东大学公共卫生学院,250012

出  处:《中国卫生统计》2011年第5期488-491,494,共5页Chinese Journal of Health Statistics

基  金:国家自然科学基金资助项目(30871392)

摘  要:目的提出一种用于非线性主成分分析中数据非线性特征的检验度量,以判断给定数据集中变量之间的线性或非线性关系,指导主成分分析方法的选择。方法将原始数据集分成若干个区域,然后根据每个区域内相关系数矩阵的置信限估计其精确边界。通过比较各个区域内的残差及其相应的精确边界,确定给定数据集中变量之间是线性关系还是非线性关系。结果在两个模拟的例子中,结果与期望结果相符:线性情况下,所有区域的残差均在精确边界以内;非线性情况下,有一些残差落在边界以外。对于给定的实际数据,至少有一个残差落在精确边界以外,意味着该数据集应该采用非线性的主成分分析。结论提出的非线性检验度量可有效地检测出给定数据集中各变量之间的线性或非线性关系,为是否对数据采用非线性主成分分析提供了依据。Objective To introduce a nonlinearity measure to test the nonlinear feature in nonlinear principal component analysis (NLPCA) to assess whether the underlying relationship within a given variable set can be described by a linear PCA model or whether nonlinear PCA model must be utilized for further study.Methods We divided original data set into several regions,and then estimated accuracy bounds of each region according to the confidence limits for each correlation matrix.And we benchmarked the residual variances for each of the regions against the corresponding accuracy bounds to test whether the relationships between the variables in the given data set were linear or not.Results In the two simulated examples,no violations of the accuracy bounds arised in the linear example while some of the residual variances fall outside the accuracy bounds in the nonlinear example.For the real data,at least one of the residual variances fall outside any of the accuracy bounds,implying that a nonlinear PCA model was required for this data set.Conclusion Our research shows that the new nonlinearity measure is effective to detect the relationships between variables in a given data set.With this measure,we can choose a more suitable model to make optimal use of all information available in the given data set.

关 键 词:主成分 聚类 非线性检验度量 精确边界 

分 类 号:O212.1[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象