自然群体多性状表型缺失值预测方法的比较被引量：1

Comparison of prediction approaches for missing observations of multi-trait phenotypes in natural population

作　　者：王媛温阳俊王艳萍刘汉钦马若洵吴清太[1] 张瑾[1] WANG Yuan;WEN Yangjun;WANG Yanping;LIU Hanqin;MA Ruoxun;WU Qingtai;ZHANG Jin(College of Sciences,Nanjing Agricultural University,Nanjing 210095,China)

机构地区：[1]南京农业大学理学院,江苏南京210095

出　　处：《南京农业大学学报》2022年第2期395-403,共9页Journal of Nanjing Agricultural University

基　　金：国家自然科学基金青年基金项目(32070688,31301229);中央高校基本科研业务费专项资金(JCQY202108)。

摘　　要：[目的]本文旨在探索不同情况下多性状联合插补分析对缺失表型的预测效果。利用统计学方法对缺失表型进行有效预测,可以增大样本量并提高数据分析的准确性。[方法]利用均值法、K邻近(K-nearest neighbor,KNN)、决策树、多重插补法(multiple imputation by chained equations,MICE)、PHENIX(phenotype imputation expediated)和softImpute插补方法对多表型模拟缺失数据进行预测,比较在不同表型缺失率、性状数、样本量和性状相关性下的插补效果。对拟南芥真实数据的长日照花期、短日照花期、春化长日照花期和春化短日照花期的表型缺失值进行多性状联合插补,并通过全基因组关联分析验证插补数据的可靠性。[结果]模拟研究表明,随着表型缺失率的增大,插补的准确性不断下降;随着性状数和性状相关性增大,插补的准确性不断上升;样本量越大插补效果越稳定。在实际数据分析中,多性状联合插补的效果与模拟试验相似,并通过全基因组关联分析和已验证基因检验了插补数据的可靠性。[结论]表型缺失率、性状数、性状相关性对缺失数据插补效果影响较大,多性状联合插补方法PHENIX、决策树和KNN可以利用性状之间的遗传结构,因此在模拟研究和实际数据分析中更精确、有效。[Objectives]This paper aimed to explore the predictive effect of missing phenotypes by multi-trait imputation methods under vorious conditions.It is an efficient way to predict the phenotypic missing data by statistical methods,which increase the sample size and improve the accuracy of data analysis.[Methods]In the simulation studies,multi-trait imputation methods of mean,KNN(K-nearest neighbor),decision tree,MICE(multiple imputation by chained equations),PHENIX(phenotype imputation expediated)and softImpute were used to predict the multi-trait missing data.Meanwhile,we compared the results of imputation methods under various phenotypic missing rates,number of traits,sample sizes and trait correlation.Furthermore,the real Arabidopsis thaliana phenotypic data of days to flowering under long day,days to flowering under short day,days to flowering under long day with vernalization and days to flowering under short day with vernalization were imputed by multi-trait imputation methods,and the reliability of the imputed data were verified by genome-wide association analysis.[Results]Simulation study showed that the accuracy of imputation decreased along with increasing of phenotypic missing rate;stronger trait correlation and more traits improved the accuracy of imputation;the larger the sample sizes were,the more stable the results would be.In real data analysis,the tendency of multi-trait imputation was consistent with the simulation experiment,and the results were verified by genome-wide association study and confirmed genes.[Conclusions]Phenotypic missing rate,number of traits and trait correlation play important roles in imputation.Multi-trait imputed methods,including PHENIX,decision tree and KNN easily capture the genetic structure,and thus are more accurate and efficient in simulation study and real data analysis.

关键词：表型缺失数据预测插补多性状基因

分类号：Q348[生物学—遗传学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

自然群体多性状表型缺失值预测方法的比较被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

自然群体多性状表型缺失值预测方法的比较 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

自然群体多性状表型缺失值预测方法的比较被引量：1