一种整合转录组和基因组数据的关联检验  

An Association Test for Integrating Transcriptome and Genomic Data

在线阅读下载全文

作  者:江珍珍 李娜 JIANG Zhen-zhen;LI Na(Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing 100190,China;School of Mathematical Sciences,University of Chinese Academy of Sciences,Beijing 100049,China;School of Applied Science,Beijing Information Science and Technology University,Beijing 100192,China)

机构地区:[1]中国科学院数学与系统科学研究院,北京100190 [2]中国科学院大学数学科学学院,北京100049 [3]北京信息科技大学理学院,北京100192

出  处:《数理统计与管理》2024年第6期995-1009,共15页Journal of Applied Statistics and Management

基  金:国家自然科学基金(11722113)。

摘  要:整合不同组织间的表达数量性状基因座信息不仅可以提高识别与疾病相关的基因的能力,而且有助于理解基因调控机制。针对每个基因,MultiXcan方法对组织间预测基因表达数据进行主成分分析,采用多元线性回归模型的F检验进行基因与表型的关联检测。复杂疾病通常由多个遗传基因协同作用,当单个基因对复杂疾病的影响都较小时,MultiXcan方法检测关联基因的能力较差。此外,MultiXcan选取前几个主成分作为变量用于构建模型存在丢失部分有用信息的风险。本文通过整合所有基因在不同组织上的基因表达的预测值来寻找与复杂疾病有关联的基因。首先,我们分别对每个基因在不同组织上的预测表达进行主成分分析。与MultiXcan不同的是,我们采用LaSSO方法对所有主成分进行变量选择,然后对被选择的主成分构建多元线性回归模型。最后利用Wald检验和FDR校正来检测每个基因与表型的关联性。数值结果表明新方法能显著提高识别目标基因的能力,不仅在大多数情况下都优于MultiXcan,并且达到同样功效所需的样本量也比MultiXcan方法小得多。Combining expression quantitative trait loci information between different tissues can improve the ability to identify potential disease predispose genes and help understand gene regulation mechanism.MultiXcan method was constructed via a multivariate regression on predicted transcriptome from multiple tissues of each gene by principal component analysis.However,due to the small contribution of each gene to complex traits,it is likely to be inefficient in the case of limited sample size.Here we propose a promising method that integrates the predicted gene expression of all genes across different tissues to detect gene associations.Principal component analysis is performed for the predicted expression of each gene in different tissues.In order to select significant relevant principal components,we adopt Lasso method.Then,multivariate linear regression model is constructed for the selected principal components,and the association between each gene and phenotype is detected by Wald test and FDR correction.Numerical results show that our method outperforms MultiXcan in most cases,suggesting that integrating gene expression data in diferent tissues indeed improves the ability to identify the associated genes.Especially,the sample size that our method requires to gain a high power is much smaller than MultiXcan.

关 键 词:Lasso FDR 功效 全转录组关联研究 表达数量性状基因座 

分 类 号:O212[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象