基于组蛋白修饰数据预测基因差异性表达的深度融合模型  

Deep fusion model for predicting differential gene expression by histone modification data

在线阅读下载全文

作  者:李昕 贾韬 LI Xin;JIA Tao(College of Computer and Information Science,Southwest University,Chongqing 400715,China)

机构地区:[1]西南大学计算机与信息科学学院,重庆400715

出  处:《计算机应用》2022年第11期3404-3412,共9页journal of Computer Applications

基  金:教育部中国高校产学研创新基金资助项目(2021ALA03016)。

摘  要:针对使用大规模组蛋白修饰(HM)数据预测基因差异性表达(DGE)时未合理利用细胞型特异性(CS)和细胞型间异同两类信息,且输入规模大、计算量高等问题,提出一种深度学习方法dcsDiff。首先,使用多个自编码器(AE)和双向长短时记忆(Bi‑LSTM)网络降维,并建模HM信号得到嵌入表示;然后,利用多个卷积神经网络(CNN)分别挖掘每类CS的HM组合效应以及两细胞型间每种HM的异同信息和所有HM的联合影响;最后,融合两类信息预测两细胞型间的DGE。在对REMC数据库中10对细胞型的实验中,与DeepDiff相比,dcsDiff的预测DGE的皮尔逊相关系数(PCC)最高提升了7.2%、平均提升了3.9%,准确检测出差异表达基因的数量最多增加了36、平均增加了17.6,运行时间节省了78.7%;进一步的成分分析实验证明了合理整合上述两类信息的有效性;并通过实验确定了算法的参数。实验结果表明dcsDiff能有效提高DGE预测的效率。Concering the problem that the Cell type‑Specificity(CS)and similarity and difference information between different cell types are not properly used when predicting Differential Gene Expression(DGE)with large‑scale Histone Modification(HM)data,as well as large volume of input and high computational cost,a deep learning‑based method named dcsDiff was proposed.Firstly,multiple AutoEncoders(AEs)and Bi‑directional Long Short‑Term Memory(Bi‑LSTM)networks were introduced to reduce the dimensionality of HM signals and model them to obtain the embedded representation.Then,multiple Convolutional Neural Networks(CNNs)were used to mine the HM combined effects in each single cell type,and the similarity and difference information of each HM and joint effects of all HMs between two cell types.Finally,the two kinds of information were fused to predict DGE between two cell types.In the comparison experiments with DeepDiff on 10 pairs of cell types in the REMC(Roadmap Epigenomics Mapping Consortium)database,the Pearson Correlation Coefficient(PCC)of dcsDiff in DGE prediction was increased by 7.2%at the highest and 3.9%on average,the number of differentially expressed genes accurately detected by dcsDiff was increased by 36 at most and 17.6 on average,and the running time of dcsDiff was saved by 78.7%.The validity of reasonable integration of the above two kinds of information was proved in the component analysis experiment.The parameters of dcsDiff were also determined by experiments.Experimental results show that the proposed dcsDiff can effectively improve the efficiency of DGE prediction.

关 键 词:组蛋白修饰 基因差异性表达 细胞型特异性 自编码器 双向长短时记忆网络 信息融合 表观遗传学 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象