不同筛选方法的低密度SNP集合填充准确性比较  被引量:1

Comparison of Imputation Accuracy for Different Low-Density SNP Selection Strategies

在线阅读下载全文

作  者:林雨浓 王泽昭 陈燕[2] 朱波[2] 高雪[2] 张路培[2] 高会江[2] 徐凌洋[2] 蔡文涛 李英豪 李俊雅[2] 高树新[1] LIN YuNong;WANG ZeZhao;CHEN Yan;ZHU Bo;GAO Xue;ZHANG LuPei;GAO HuiJiang;XU LingYang;CAI WenTao;LI YingHao;LI JunYa;GAO ShuXin(College of Animal Science and Technology,Inner Mongolia University for the Nationalities,Tongliao 028042,Inner Magnolia;Institute of Animal Sciences,Chinese Academy of Agriculture Sciences,Beijing 100193;Tongliao Jingyuan Breeding Cattle Breeding LLC,Tongliao 028006,Inner Magnolia)

机构地区:[1]内蒙古民族大学动物科技学院,内蒙古通辽028042 [2]中国农业科学院北京畜牧兽医研究所,北京100193 [3]通辽京缘种牛繁育有限责任公司,内蒙古通辽028006

出  处:《中国农业科学》2023年第8期1585-1593,共9页Scientia Agricultura Sinica

基  金:内蒙古自治区第五批“草原英才”工程产业创新创业人才团队专项、内蒙古自然科学基金面上项目(2019MS03077);内蒙古自治区科技计划项目(KJXM2020002-05);青年科学基金(32102505)。

摘  要:【目的】尝试通过在华西牛参考群高密度标记芯片位点中,使用两种标记筛选方法挑选具有代表性的且密度梯度不同的SNP位点集合,后利用基因组填充策略在相同填充参数下将低密度芯片数据填充至高密度继而进行后续基因组研究,从而达到降低华西牛基因型分型成本的目的。研究分别比较了不同标记集合填充准确性和填充一致性的差异,阐述了标记筛选方法、标记密度、最小等位基因频率和参考群体数量等4个因素对填充结果的影响,为华西牛低密度SNP填充芯片设计提供参考。【方法】将质控后剩余的1233头华西牛群体随机分为参考群(986头)和验证群(247头)。使用等间距法(equidistance,EQ)和高MAF法(high MAF,HM)两种标记筛选方法分别从华西牛参考群体的Illumina Bovine HD芯片位点集合中筛选出16种不同密度的SNP集合,共生成32种不同SNP梯度密度集合。随后在验证群体中利用Beagle(v5.1)软件将各低密度集合填充至770 k密度水平,计算填充准确性和填充一致性并对填充性能影响因素进行分析。【结果】32种低密度SNP集合的标记数量在100—16000之间,窗口最大为24176 kb,最小151 kb。随着标记密度升高,EQ和HM两种筛选方法填充一致性和准确性不断提升,但填充准确性和填充一致性增加的幅度越来越小。当标记集合密度超过12 k后均趋于平稳。SNP密度在16 k时两种方法的填充准确性达到最高(r^(2)_(EQ)=0.8801,r^(2)_(MAF)=0.8696)。当标记密度低于11 k时,不同标记密度梯度下HM方法填充一致性均高于EQ方法。然而当SNP集合密度超过11 k时,EQ筛选方法较表现出填充优势。与填充一致性结果趋势相似,在SNP集合密度低于10 k时,HM方法仍然具有较高的填充准确性,但当SNP集合密度高于10 k时,EQ方法的填充准确性则较高,且在SNP密度集合大于12 k后,EQ填充准确性趋于稳定。同时研究发现与低MAF标记位点相比【Objective】To facilitate the low-cost genomic selection in Huaxi Cattle,the present study represented the first attempt to designed a new low-density Genotype chip to support imputation to higher density genotypes.The representative SNP markers with different density gradients were selected from high-density SNP chips in the Huaxi cattle reference population by using two SNP selection methods.And then,these marker sets were imputed to high-density sets with the same imputation parameters for subsequent genomic studies.Meanwhile,the current study compared the differences in imputation accuracy and concordance among SNP panels and illustrated the effects of four factors on imputation results,including marker screening method,marker density,minor allele frequency,and the number of reference population.This study could provide insights about the methods to select the low-density SNP markers for imputation in the current population and the representative SNPs,and aid in designing low-density SNP chip for Huaxi cattle.【Method】Totally 1,233 Huaxi cattle after genotypes filtered was randomly divided into reference(986)and validation(247)populations.,Based on Equidistance(EQ)and high MAF(HM),two SNP selection strategies were used to make 16 SNP sets with different densities from the Illumina Bovine HD chip in the reference population,respectively.Each of the 32 low-density set was then imputed to the 770K density level in the validation population by using Beagle(v5.1),while the imputation accuracy and concordance were calculated as the mean correlation between true and imputed genotypes.Finally,a comprehensive set of factors that influence the imputation performance were analyzed.【Result】The number of markers in the 32 low-density SNP sets ranged from 100 to 16000,with a maximum window of 24176 kb and a minimum window of 151 kb.The imputation accuracy and concordance of both EQ and HM methods went up with increasing marker densities.The imputation accuracy of both methods was the highest at 16k SNP density(r^

关 键 词:填充准确性 低密度SNP芯片 华西牛 连锁不平衡 最小等位基因频率 

分 类 号:S823[农业科学—畜牧学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象