机构地区:[1]桂林理工大学信息科学与工程学院,广西桂林541006 [2]广西嵌入式技术与智能系统重点实验室(桂林理工大学),广西桂林541006
出 处:《计算机应用》2023年第6期1979-1986,共8页journal of Computer Applications
基 金:国家自然科学基金资助项目(62166014,62162019);广西自然科学基金资助项目(2020GXNSFAA297255);广西嵌入式技术与智能系统重点实验室项目(2019-01-16)。
摘 要:大部分现有的用于预测环状RNA(circRNA)与疾病之间关联关系的计算模型通常使用circRNA和疾病相关数据等生物学知识,配合已知的circRNA-疾病关联信息对来挖掘出潜在的关联信息。然而这些模型受已知关联构成的网络稀疏性、负样本过少等固有问题的影响,导致预测性能不佳。因此,在图自动编码器基础上引入归纳式矩阵补全及自注意力机制进行二阶段融合,以实现circRNA-疾病关联预测,由此构建的模型叫GIS-CDA(Graph auto-encoder combining Inductive matrix complementation and Self-attention mechanism for predicting Circ RNA-Disease Association)。首先,计算circRNA集成和疾病集成的相似性,并利用图自动编码器学习circRNA和疾病的潜在特征,以获得低维表征;接着,将学习到的特征输入归纳式矩阵补全,以提高节点之间的相似性和依赖性;然后,将circRNA特征矩阵和疾病特征矩阵整合为circRNA-疾病特征矩阵,以增强预测的稳定性和精确性;最后,引入自注意力机制,从特征矩阵中提取重要特征,并减少对其他生物信息的依赖。五折交叉和十折交叉验证的结果显示:GIS-CDA获得的平均接收者操作特征曲线下面积(AUROC)值分别为0.9303和0.9393,前者比基于KATZ测度的人类circRNA-疾病关联预测模型(KATZHCDA)、基于深度矩阵分解方法的circRNA-疾病关联(DMFCDA)预测模型、RWR(Random Walk with Restart)和基于加速归纳式矩阵补全的circRNA-疾病关联(SIMCCDA)预测模型分别高出了13.19、35.73、13.28和5.01个百分点;GIS-CDA的精确率-召回率曲线下面积(AUPR)值分别为0.2271和0.2340,前者比上述对比模型分别高出了21.72、22.43、21.96和13.86个百分点。此外,在circRNADisease、circ2Disease和circ R2Disease数据集上的消融实验和案例研究进一步验证了GIS-CDA在预测circRNA-疾病的潜在关联方面具有较好的性能。Most existing computational models for predicting associations between circular RNA(circRNA)and diseases usually use biological knowledge such as circRNA and disease-related data,and mine the potential association information by combining known circRNA-disease association information pairs.However,these models suffer from inherent problems such as sparsity and too few negative samples of networks composed of the known association,resulting in poor prediction performance.Therefore,inductive matrix completion and self-attention mechanism were introduced for two-stage fusion based on graph auto-encoder to achieve circRNA-disease association prediction,and the model based on the above is GIS-CDA(Graph auto-encoder combining Inductive matrix complementation and Self-attention mechanism for predicting CircRNA-Disease Association).Firstly,the similarity of circRNA integration and disease integration was calculated,and graph auto-encoder was used to learn the potential features of circRNAs and diseases to obtain low-dimensional representations.Secondly,the learned features were input to inductive matrix complementation to improve the similarity and dependence between nodes.Thirdly,the circRNA feature matrix and disease feature matrix were integrated into circRNA-disease feature matrix to enhance the stability and accuracy of prediction.Finally,a self-attention mechanism was introduced to extract important features in the feature matrix and reduce the dependence on other biological information.The results of five-fold crossover and ten-fold crossover validation show that the Area Under Receiver Operating Characteristic curve(AUROC)values of GIS-CDA are 0.9303 and 0.9393 respectively,the former of which is 13.19,35.73,13.28 and 5.01 percentage points higher than those of the prediction models based on computational model of KATZ measures for Human CircRNA-Disease Association(KATZHCDA),Deep Matrix Factorization for CircRNA-Disease Association(DMFCDA),RWR(Random Walk with Restart)and Speedup Inductive Matrix Completion for C
关 键 词:图自动编码器 归纳式矩阵补全 自注意力机制 环状RNA 环状RNA-疾病关联信息对
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...