基于组特征选择的两阶段猪表型预测方法研究  

Research on a Two-Stage Prediction Method for Pig Traits Based on Group Feature Selection

在线阅读下载全文

作  者:陈奕菲 苏瑞琳 申振才 谭俊艳[1] 钟萍[1] CHEN Yifei;SU Ruiin;SHEN Zhencai;TAN Junyan;ZHONG Ping(College of Science,China Agricultural University,Beijing 100083,China)

机构地区:[1]中国农业大学理学院,北京100083

出  处:《中国猪业》2024年第6期33-41,共9页China Swine Industry

基  金:2024年度北京市级本科生创新训练项目(S202410019026);确定主要家畜品种鉴定的优化分析方法(19230535)。

摘  要:基因组选择(Genomic selection)指用全基因组分子标记数据,例如单核苷酸多态性(Single nucleotidepolymorphism,SNP)来估计育种值(Genomic estimated breeding values,GEBVs)。该技术正在改变畜禽和植物育种的估计方法。而准确估计育种值的关键在于能够根据给定的基因型数据准确估计表型值。然而,现有的动物表型值估计方法未能充分考虑并非所有SNP位点都具有生物学效应这一事实。本研究提出了一种基于组特征选择和机器学习的两阶段表型预测方法(Two-stage phenotype prediction method,TSPM)。该方法首先应用K-means聚类算法对特征进行分组,并选择与表型相关的特征组,随后在经过特征选择的数据集上运用核岭回归方法来预测表型值。为验证方法的有效性,本研究在实际数据集上,将本文提出的方法与包括基因组最佳线性无偏预测(Genomic best linear unbiasedprediction,GBLUP)和支持向量机回归(Support vector regression,SVR)在内的8种经典方法进行对比。试验结果表明,两阶段表型预测法比大部分机器学习方法的预测能力强,尤其在高遗传力性状上的预测精度尤为显著。与经典的GBLUP相比,本方法的准确性提高了3.86%。Genomic selection refered to estimate breeding values using whole-genome molecular marker data,such as Single Nu-cleotide Polymorphisms(SNPs),to estimate Genomic Estimated Breeding Values(GEBVs).This technology was revolutionize the esti-mation of livestock and plant breeding.The key to accurately estimated breeding values lies in accurately predicting phenotypic values based on given genotypic data.However,existing methods for estimating animal phenotypes did not adequately consider that not all SNP loci had biological effects.This study proposed Two-stage Phenotype Prediction Method(TSPM)based on group feature selection and machine learning.The method first applied the K-means clustering algorithm to group features and select feature groups related to phenotypes,then used kernel ridge regression on the dataset after feature selection to predict phenotypic values.To verify the method's effectiveness,this study compared the propose method with eight classical methods,including Genomic Best Linear Unbiased Predic-tion(GBLUP),Support Vector Regression(SVR)and other methods,on actual datasets.The experimental results showed that the Two-stage Phenotype Prediction Method had more substantial predictive power than most machine learning methods,especially in traits with high heritability,where the prediction accuracy was particularly significant.Compared to the classical GBLUP,the accuracy of this method had increased by 3.86%.

关 键 词: 组特征选择 育种 表型预测 基因选择 岭回归 

分 类 号:S828[农业科学—畜牧学] S813[农业科学—畜牧兽医]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象