检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王媛 温阳俊 王艳萍 刘汉钦 马若洵 吴清太[1] 张瑾[1] WANG Yuan;WEN Yangjun;WANG Yanping;LIU Hanqin;MA Ruoxun;WU Qingtai;ZHANG Jin(College of Sciences,Nanjing Agricultural University,Nanjing 210095,China)
出 处:《南京农业大学学报》2022年第2期395-403,共9页Journal of Nanjing Agricultural University
基 金:国家自然科学基金青年基金项目(32070688,31301229);中央高校基本科研业务费专项资金(JCQY202108)。
摘 要:[目的]本文旨在探索不同情况下多性状联合插补分析对缺失表型的预测效果。利用统计学方法对缺失表型进行有效预测,可以增大样本量并提高数据分析的准确性。[方法]利用均值法、K邻近(K-nearest neighbor,KNN)、决策树、多重插补法(multiple imputation by chained equations,MICE)、PHENIX(phenotype imputation expediated)和softImpute插补方法对多表型模拟缺失数据进行预测,比较在不同表型缺失率、性状数、样本量和性状相关性下的插补效果。对拟南芥真实数据的长日照花期、短日照花期、春化长日照花期和春化短日照花期的表型缺失值进行多性状联合插补,并通过全基因组关联分析验证插补数据的可靠性。[结果]模拟研究表明,随着表型缺失率的增大,插补的准确性不断下降;随着性状数和性状相关性增大,插补的准确性不断上升;样本量越大插补效果越稳定。在实际数据分析中,多性状联合插补的效果与模拟试验相似,并通过全基因组关联分析和已验证基因检验了插补数据的可靠性。[结论]表型缺失率、性状数、性状相关性对缺失数据插补效果影响较大,多性状联合插补方法PHENIX、决策树和KNN可以利用性状之间的遗传结构,因此在模拟研究和实际数据分析中更精确、有效。[Objectives]This paper aimed to explore the predictive effect of missing phenotypes by multi-trait imputation methods under vorious conditions.It is an efficient way to predict the phenotypic missing data by statistical methods,which increase the sample size and improve the accuracy of data analysis.[Methods]In the simulation studies,multi-trait imputation methods of mean,KNN(K-nearest neighbor),decision tree,MICE(multiple imputation by chained equations),PHENIX(phenotype imputation expediated)and softImpute were used to predict the multi-trait missing data.Meanwhile,we compared the results of imputation methods under various phenotypic missing rates,number of traits,sample sizes and trait correlation.Furthermore,the real Arabidopsis thaliana phenotypic data of days to flowering under long day,days to flowering under short day,days to flowering under long day with vernalization and days to flowering under short day with vernalization were imputed by multi-trait imputation methods,and the reliability of the imputed data were verified by genome-wide association analysis.[Results]Simulation study showed that the accuracy of imputation decreased along with increasing of phenotypic missing rate;stronger trait correlation and more traits improved the accuracy of imputation;the larger the sample sizes were,the more stable the results would be.In real data analysis,the tendency of multi-trait imputation was consistent with the simulation experiment,and the results were verified by genome-wide association study and confirmed genes.[Conclusions]Phenotypic missing rate,number of traits and trait correlation play important roles in imputation.Multi-trait imputed methods,including PHENIX,decision tree and KNN easily capture the genetic structure,and thus are more accurate and efficient in simulation study and real data analysis.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30