SCAD惩罚下基因–环境交互效应的识别方法研究  

Identification of Gene-Environment Interaction Using SCAD Penalty

在线阅读下载全文

作  者:谢文玮 李东喜 

机构地区:[1]太原理工大学数学学院,山西 太原 [2]太原理工大学大数据学院,山西 太原

出  处:《应用数学进展》2021年第5期1765-1775,共11页Advances in Applied Mathematics

摘  要:对于许多复杂的癌症疾病,单一的基因效应或单一的环境效应不能进行有效的预测判断,识别与复杂疾病相关的基因–环境交互作用成为了高维数据下病理学和生物信息学研究的一大挑战。对于生存数据高维度、异质性、删失性等问题,我们提出了一种基于AFT模型的识别基因–环境交互作用的方法。该方法创新地通过采用LAD损失函数和SCAD惩罚函数相结合的目标函数减除数据不平衡带来的影响并选出服从主效应与交互效应间的强层次结构的交互项,并利用CCCP算法对目标函数进行优化求解。利用R进行了仿真研究和实证研究,从这两方面验证了该方法能稳健地选择出合适的基因效应和基因–环境交互效应,具有较好的预测性和稳定性,且该方法能有效压缩备选的变量,选出的模型简洁、有较好的解释性。For many complex cancer diseases, a single gene effect or a single environmental effect cannot account for the total variant of prediction results. Identifying the gene-environment interactions associated with complex diseases has become a major challenge for pathology and bioinformatics research under high-dimensional data. To solve the problems of high dimension, heterogeneity, and censored survival data, we proposed an AFT model-based method to identify gene-environment interactions. In this method, an objective function combining LAD loss function and SCAD penalty function is innovatively adopted to reduce the influence of unbalanced data and to select interaction terms that follow a strong hierarchical structure between main effects and interaction effects. The objective function is optimized and solved by CCCP algorithm. Simulation and empirical studies were carried out using R to verify that this method can select the appropriate gene effect and gene-environment interaction effect, and has good predictability and stability. Moreover, this method can effectively compress the alternative variables, and the selected model is simple and has good explanatory ability.

关 键 词:变量选择 基因–环境交互 SCAD惩罚项 加权LAD损失函数 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象