多次交叉验证对PLSDA模型的影响研究  被引量:5

Research of the Effect of Multiple Cross-validation on PLSDA Model

在线阅读下载全文

作  者:曲思杨 张秋菊[1] 王文佶[1] 谢彪[1] 孙琳[1] 高兵[1] 刘美娜[1] 

机构地区:[1]哈尔滨医科大学公共卫生学院卫生统计学教研室,150081

出  处:《中国卫生统计》2017年第1期15-17,22,共4页Chinese Journal of Health Statistics

基  金:黑龙江省自然基金重点项目(ZD201314);国家自然基金(81502889)

摘  要:目的比较一次交叉验证和多次交叉验证对PLSDA最优模型的影响,探讨在个体正确分组和少数个体错分时,多次交叉验证对PLSDA最优模型稳定性的影响。方法打乱数据集中个体顺序进行多次交叉验证,通过一次交叉验证和多次交叉验证的方法对模拟数据和真实数据进行分析,使用成分数和MSEP等参数值来评价模型变异性和稳定性。结果模拟数据结果,仅进行1次交叉验证结果成分数为3,MSEP值为0.3792;在不打乱数据标签时,5000次交叉验证结果中,成分数范围是2~6,MSEP值的范围0.2569~0.5794;打乱5%的标签时,5000次交叉验证结果中,成分数范围是1~8,MSEP值的范围0.2061~0.6463;真实数据结果,进行1次交叉验证结果成分数为4,MSEP值为0.1376;10000次交叉验证成分数范围是4~10,MSEP范围是0.0802~0.3761。结论一次交叉验证结果不稳定,在应用PLSDA建模时,多次交叉验证在少量个体错分时能够获得稳定模型,建议使用多次交叉验证确保PLSDA模型稳定性。Objective To compare the effect of one cross-validation and multiple cross-validations on PLSDA optimal model and discuss the effect of multiple cross-validations on stability of the optimal modelwhen a few individuals arewrong grouped andwhen all individuals are right grouped,respectively. Methods The order of individuals in one datasetwas disorganized to perform multiple cross-validations. Simulative data and real datawere analyzed using one cross-validation and multiple cross-validations. The variation and stability of the modelswere tested using parameters like principal component number andMSEP. Results For simulative data,the principal component number of one cross-validation is 3 andMSEP is 0. 3792; for result of 5000 cross-validationswhen the data is not disordered,the range of principal component number is 2 ~ 6 and the range ofMSEP is 0. 2569 ~ 0. 5794; for result of 5000 cross-validationswhen the data is 5% disordered,the range of principal component number is 1 ~ 8 and the range ofMSEP is 0. 2061 ~ 0. 6463; for result of 10000 times cross-validation of real data,the range of principal component number is 4 ~ 10 and the range ofMSEP is 0. 0802 ~ 0. 3761. Conclusion PLSDA models built by one cross-validation are not stablewhereas multiple cross-validations can help build PLSDA models more stablywhen a few individuals arewrong grouped. So multiple cross-validation is recommended to ensure the stability of PLSDA model.

关 键 词:交叉验证 PLSDA 高维数据 

分 类 号:O212.1[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象