惩罚广义线性模型在遗传关联研究中的应用及R软件实现  被引量:5

Application of Penalized Generalized Linear Model in Genetic Association Study and its Software Implementation in R

在线阅读下载全文

作  者:张俊国[1] 刘丽[1] 李丽霞[1] 张敏[1] 郜艳晖[1] 

机构地区:[1]广东药学院公共卫生学院流行病与卫生统计学系,510310

出  处:《中国卫生统计》2016年第4期582-586,共5页Chinese Journal of Health Statistics

基  金:国家自然科学基金(81302493);广东省科技厅社会发展基金(2014A020212307);广东省自然科学基金(S2013040013590)

摘  要:目的遗传关联研究中高维数据与日俱增。本文探讨基于岭估计、LASSO和弹性网的广义线性模型在遗传关联研究的应用及软件实现,为高维关联分析提供方法学参考。方法介绍惩罚广义线性模型原理及软件实现方法,并采用模拟的连锁平衡和连锁不平衡的SNPs关联研究数据,以惩罚logistic模型例证R软件glmnet包对广义线性模型的拟合。结果对连锁平衡和连锁不平衡SNPs模拟数据,LASSO与弹性网均给出稀疏解,较好地选择有关联SNPs而剔除无关联变量;而岭估计把所有变量都保留在模型中,模型复杂度高但相应的解释度未增加。结论 LASSO和弹性网可对高维遗传关联数据进行有效降维,筛选变量的同时提供参数估计,从而降低模型的复杂度。R软件的glmnet包灵活拟合各类惩罚广义线性模型,可在高通量遗传关联分析中推广应用。Objective With the development of high-dimensional data in genetic association studies, this study was aimed to depict the application of penalized generalized linear model based on ridge, LASSO and elastic net in genetic association study and its software implementation. Methods We introduced the penalized generalized linear model in detail, and performed the analyses in the simulated SNPs data in linkage equilibrium and linkage disequilibrium, respectively. At last, we used the ' glmnet' package in R software to fit the penalized generalized linear model with the example of logistic model. Results For both of the SNPs data in linkage equilibrium and linkage disequilibrium,LASSO and elastic net could worked out sparse solution and effectively detected out the associated SNPs, as well as fairly excluded the non-associated variables. By contrast, the ridge model included all variables, and thus promoted the complexity of model, but meanwhile its explanation was not added. Conclusion For the high-dimensional association data, LASSO and Elastic net can achieve effective dimensionality reduction, variables selection, and parameters estimation simultaneously, and thus reduce the complexity of the model. R glmnet package can flexibly fit different types of generalized linear model. Therefore, it can be wildly used in high-throughput genetic association studies.

关 键 词:惩罚广义线性模型 遗传关联研究 LASSO 弹性网 glmnet包 

分 类 号:R394[医药卫生—医学遗传学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象