检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈洁 肖静[1] 任文龙 CHEN Jie;XIAO Jing;REN Wen-long(School of Public Health,Nantong Universilty,Nantong,Jiangsu 226019,China)
出 处:《现代预防医学》2023年第12期2113-2117,2122,共6页Modern Preventive Medicine
基 金:国家自然科学基金(81803330);江苏省自然科学基金(BK20180950)。
摘 要:目的 比较四种常用的全基因组关联分析(GWAS)方法在蒙特卡洛模拟数据中的统计性能,以及在BMI真实数据应用中的差异,为GWAS方法的合理选用提供参考。方法 基于UK BioBank数据库选取基因型,采用蒙特卡洛模拟不同数量性状核苷酸位点(QTN)和遗传率的表型,分别运用BOLT-LMM、FarmCPU、fastGWA和GLM四种方法对模拟数据进行GWAS,评价四种方法的检验功效、假阳率和运算时间。并将四种方法应用于BMI真实数据分析,对不同方法鉴定出的关联基因进行比较。结果 蒙特卡洛模拟分析显示,BOLT-LMM和FarmCPU检验功效最高(以QTN数目为1 250,遗传率为0.8为例),其检验功效对假阳率曲线下面积(AUC)分别为0.504 1和0.458 4,其次是fastGWA(AUC=0.377 0),GLM最低(AUC=0.375 5)。运算速度最快的是GLM(7.47小时),fastGWA略慢(约11小时),FarmCPU和BOLT-LMM所需时间分别是GLM的19.5倍和71.3倍。BMI实例分析显示,fastGWA效果最佳,鉴定出54个已报道关联基因,BOLT-LMM、FarmCPU和GLM鉴定出关联基因的数目分别为35、35和34个。结论 在分析大型人群队列GWAS数据时,可先选择GLM快速获得初步结果,进一步采用fastGWA、FarmCPU或BOLT-LMM可能会鉴定出更多的关联基因,实际应用时可综合四种方法结果以发现新的关联基因。Objective To compare the statistical performance of four frequently used genome-wide association study(GWAS)methods in Monte Carlo simulation data,as well as the differences in the application of BMI real data,so as to provide a reference for the reasonable selection of GWAS methods.Methods Based on the genotype of UK BioBank database,Monte Carlo was used to simulate the phenotype of different quantitative trait nucleotide(QTN)loci and heritability,and the four methods of BOLT-LMM,FarmCPU,fastGWA and GLM were used to conduct genome-wide association analysis on the simulated data,and evaluated their power of test,false positive rate and running time.The four methods were applied to the analysis of BMI,and the associated genes identified by different methods were compared.Results Monte Carlo simulation analysis showed that BOLT-LMM and FarmCPU had the highest power of test(taking the QTN number of 1250 and heritability of 0.8 as an example),with their area under power of test against the false positive rate curve(AUC)of 0.5041 and 0.4584,followed by fastGWA(AUC=0.3770),and GLM was the lowest(AUC=0.3755).GLM had the fastest computing speed(7.47 hours).FastGWA was slightly slower than GLM(about 11 hours).The time required for FarmCPU and BOLT-LMM were 19.5 times and 71.3 times as much as that of GLM,respectively.BMI data analysis showed that fastGWA had the best performance,and 54 reported association genes had been identified.The number of association genes identified by BOLT-LMM,FarmCPU and GLM were 35,35 and 34,respectively.Conclusion When analyzing GWAS data of large population cohort,GLM can be selected to quickly obtain preliminary results.Further using fastGWA,FarmCPU or BOLT-LMM may identify more associated genes.In practical application,four methods can be combined to find new associated genes.
关 键 词:全基因组关联分析 蒙特卡洛模拟 身体质量指数 混合线性模型 广义线性模型
分 类 号:R195.1[医药卫生—卫生统计学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.148.241.210