An ensemble-based likelihood ratio approach for family-based genomic risk prediction  

基于家系数据集群化似然比算法的疾病基因组遗传风险预测研究(英文)

在线阅读下载全文

作  者:Hui AN Chang-shuai WEI Oliver WANG Da-hui WANG Liang-wen XU Qing LU Cheng-yin YE 

机构地区:[1]Department of Health Management,School of Medicine,Hangzhou Normal University,Hangzhou 310036,China [2]Department of Biastatistics and Epidemiology,University of North Texas Health Science Center,Fort Worth,TX 76107,USA [3]HBI Solutions lnc,Palo Alto,CA 94301,USA [4]Department of Preventive Medicine,School of Medicine,Hangzhou Normal University,Hangzhou 310036,China [5]Department of Epidemiology and Biostatistics,College of Human Medicine,Michigan State University,East Lansing,MI48824,USA

出  处:《Journal of Zhejiang University-Science B(Biomedicine & Biotechnology)》2018年第12期935-947,共13页浙江大学学报(英文版)B辑(生物医学与生物技术)

基  金:Project supported by the National Natural Science Foundation of China(No.81402762);the National Institute on Drug Abuse(Nos.K01DA033346 and R01DA043501),USA

摘  要:Objective: As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history. Methods: In this study, we propose an ensemble-based likelihood ratio(ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic(ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building. Results: Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level. Conclusions: By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.目的:作为遗传研究中最常用的设计之一,基于家系数据的实验设计因其优势而得到了广泛认可,例如家系数据在人群分层和混合情况下表现出来的稳健性。在疾病风险预测中,研究者对如何基于家系遗传数据,寻找和分析遗传标记的作用非常感兴趣。本研究旨在开发一种新的统计方法,用于基于家系数据的遗传风险预测。创新点:期望新方法能够捕捉小或中等边际效应的遗传因子,及其相互作用,与基于家族史或家系数据的现有风险预测方法相比,具有更高的预测准确性。方法:在这项研究中,我们提出了集群化似然比(ELR)的新方法,Fam-ELR,用于家系数据的基因组疾病风险预测。Fam-ELR采用集群化的受试者工作特征曲线(ROC)方法来考虑家系样本内部的相关性,并使用计算有效的集群树进行变量选择和模型构建。结论:通过模拟,Fam-ELR显示了其在各种疾病遗传模型和谱系结构中的稳健性,并且获得了比现有的两种基于家系数据的风险预测方法更好的性能。同时,在基于全基因组行为障碍家系数据集的实际应用中,Fam-ELR展示了其将潜在风险预测因子和其相互作用整合到模型中以提高准确性的能力,尤其是在全基因组水平上。通过比较现有方法,例如遗传风险评分方法等,Fam-ELR被证实具有将较小或中等边际效应的遗传变异及其相互作用纳入改进的风险预测模型的能力。因此,它是一种强有力且实用的方法,适用于基于家系数据的高维度遗传风险预测中,特别是对于病因未知或知之甚少的人类复杂疾病。

关 键 词:Family-based study Genetic risk prediction High-dimensional data 

分 类 号:Q811.4[生物学—生物工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象