基于个性化随机游走的基因-表型关联分析  被引量:1

Individual Random Walks for Gene-Phenotype Association Analysis

在线阅读下载全文

作  者:谭好江 王峻[2] 余国先 陈建 郭茂祖 TAN Hao-jiang;WANG Jun;YU Guo-xian;CHEN Jian;GUO Mao-zu(School of Software,Shandong University,Jinan,Shandong 250101,China;Joint Centre for Artificial Intelligence Research,Shandong University,Jinan,Shandong 250101,China;College of Agronomy and Biotechnology,China Agricultural University,Beijing 100083,China;College of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China)

机构地区:[1]山东大学软件学院,山东济南250101 [2]山东大学人工智能国际联合研究院,山东济南250101 [3]中国农业大学农学院,北京100083 [4]北京建筑大学电气与信息工程学院,北京100044

出  处:《电子学报》2024年第5期1619-1632,共14页Acta Electronica Sinica

基  金:国家自然科学基金(No.62031003,No.62072380);山东大学中央高校基本业务费(No.2020GN061)。

摘  要:基因与表型间的关联分析对揭示生物的内在遗传关联具有重要意义.随机游走算法可以融合多组学数据,聚合一阶或高阶邻居的标签信息,对网络中不同节点间关联信息进行补全,提高关联预测的准确度,进而发现基因和表型间潜在的遗传关联.但现有随机游走算法通常平等地对待每个节点,忽略了不同节点的重要性,使非重要节点过度传播,降低了模型性能.为此,本文提出了一种基于多组学数据融合的个性化随机游走算法(individual Multiple Random Walks,iMRW),在由基因、miRNA及表型节点构建的多组学异质网络上,基于网络拓扑结构,设计个性化多元随机游走策略,为不同重要程度的节点分配不同的游走步长,并结合高斯相互作用属性核相似性与随机游走,对网络不同节点及节点间关联信息进行补全,最终实现多源基因-表型关联矩阵的融合,准确获取基因-表型关联预测矩阵.在不同实验设置下,与主流算法的对比实验结果均显示iMRW能够取得更优的预测性能.在玉米光合作用能力和淀粉含量表型的实验分析结果也进一步证实了iMRW在识别潜在的基因-表型关联的实用性与有效性.Association analysis between genes and phenotypes is crucial to reveal the inherent genetic association of organisms.Random walk-based algorithms can fuse multiple omics data,aggregate the label information of first-order or higher-order neighbors,complete the association information between different nodes in the network,improve the accuracy of association prediction and further discover the potential genetic associations between genes and phenotypes.However,existing random walk algorithms usually treat each node equally and ignore the varying importance of different nodes,as such non-important nodes can be excessively propagated and the model performance is compromised.To this end,an individual multiple random walks(iMRW)algorithm based on multi-omics data fusion is proposed.On the heterogeneous genetic network composed with genes,miRNAs and phenotype nodes,we design the individual multiple random walks strategy based on the network topology,assign nodes of different importance with different walking lengths.We then complete the genetic information of different nodes by fusing multi-source association matrix,Gaussian interaction profile kernel similarity and random walk,and accurately obtain the gene-phenotype association prediction matrix.Under different experimental settings,iMRW can achieve the best prediction performance compared with the state-of-the-art algorithms.The case study with respect to maize photosynthetic ability and starch content further confirm the usefulness and effectiveness of iMRW in identifying potential gene-phenotype associations.

关 键 词:基因-表型关联 随机游走 异质网络 多组学数据融合 网络拓扑结构 

分 类 号:TP399[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象