机构地区:[1]中国农业科学院北京畜牧兽医研究所/农业部动物遗传育种与繁殖(家禽)重点实验室,北京100193
出 处:《中国农业科学》2023年第18期3682-3692,共11页Scientia Agricultura Sinica
基 金:国家自然科学基金面上项目(32172702);国家重点研发计划(2021YFD130110203);中国农业科学院科技创新工程(ASTIP-IAS02);国家生猪产业技术体系(CARS-35)。
摘 要:基因组选择是指利用覆盖在全基因组范围内的分子标记信息来估计个体育种值。利用基因组信息能够避免因系谱错误带来的诸多问题,提高选择准确性并缩短育种世代间隔。根据统计模型的不同,基因组选择方法可大致分为基于BLUP(best linear unbiased prediction,BLUP)理论的方法、基于贝叶斯理论的方法和其他方法。目前应用较多的是GBLUP及其改进方法ssGBLUP。准确性是基因组选择模型最常用的评价指标,用来衡量真实值和估计值之间的相似程度。影响准确性的因素可以从模型中体现,大致分为可控因素和不可控因素。传统基因组选择方法促进了动物育种的快速发展,但这些方法目前都面临着多群体、多组学和计算等诸多挑战,不能捕获基因组高维数据间的非线性关系。作为人工智能的一个分支,机器学习是最贴近生物掌握自然语言处理能力的一种方式。机器学习从数据中提取特征并自动总结规律,利用该规律与新数据进行预测。对于基因组信息,机器学习无需进行分布假设,且所有的标记信息都能够被考虑进模型当中。相比于传统的基因组选择方法,机器学习更容易捕获基因型之间、表型与环境之间的复杂关系。因此,机器学习在动物基因组选择中具有一定的优势。根据训练期间接受的监督数量和监督类型,机器学习可分为监督学习、无监督学习、半监督学习和强化学习等。它们的主要区别为输入的数据是否带有标签。目前在动物基因组选择中应用的机器学习方法均为监督学习。监督学习可以处理分类和回归问题,需要向算法提供有标签的数据和所需的输出。近年来机器学习在动物基因组选择中的应用不断增多,特别是在奶牛和肉牛中发展较快。本文将机器学习算法划分为单个算法、集成算法和深度学习3类,综述其在动物基因组选择中的研究进展。单个算Genomic selection is defined as using the molecular marker information that covered the whole genome to estimate individual’s breeding values.Using genome information can avoid many problems caused by pedigree errors so as to improve selection accuracy and shorten breeding generation intervals.According to different statistical models,methods of estimated genomic breeding value(GEBV)can be divided into based on BLUP(best linear unbiased prediction)theory,based on Bayesian theory and others.At present,GBLUP and its improved method ssGBLUP have been widely employed.Accuracy is the most used evaluation metric for genomic selection models,which is to evaluate the similarity between the true value and the estimated value.The factors that affect the accuracy can be reflected from the model,which can be divided into controllable factors and uncontrollable factors.Traditional genomic selection methods have promoted the rapid development of animal breeding,but these methods are currently facing many challenges such as multi-population,multi-omics,and computing.What’s more,they cannot capture the nonlinear relationship between high-dimensional genomic data.As a branch of artificial intelligence,machine learning is very close to biological mastery of natural language processing.Machine learning extracts features from data and automatically summarizes the rules and use to make predictions for new data.For genomic information,machine learning does not require distribution assumptions,and all marker information can be considered in the model.Compared with traditional genomic selection methods,machine learning can more easily capture complex relationships between genotypes,phenotypes,and the environment.Therefore,machine learning has certain advantages in animal genomic selection.According to the amount and type of supervision received during training,machine learning can be classified into supervised learning,unsupervised learning,semi-supervised learning,and reinforcement learning.The main difference is whether the input
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...