检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘冬宁[1] 王子奇 曾艳姣 文福燕 王洋 Liu Dong-ning;Wang Zi-qi;Zeng Yan-jiao;Wen Fu-yan;Wang Yang(School of Computer Science and Technology,Guangdong University of Technology,Guangzhou 510006,China)
机构地区:[1]广东工业大学计算机学院,广东广州510006
出 处:《广东工业大学学报》2023年第1期1-9,共9页Journal of Guangdong University of Technology
基 金:国家自然科学基金资助面上项目(62072120)。
摘 要:DNA-N6甲基腺嘌呤(6-mA)甲基化修饰是重要的表观遗传修饰标记之一。异常的6-mA位点会影响基因表达,进而引发多种重大疾病,因此预测6-mA位点对理解治病机理和治疗疾病具有重要意义。提出一种基于K-mer方法和One-hot方法复合特征编码的长短期记忆(Long Short-Term Memory,LSTM)神经网络用于基因甲基化位点预测,通过K-mer编码方法增加基因序列字符信息量,再使用One-hot编码方法对编码后的字符序列进行扩展,形成复合编码矩阵。改进后的序列编码矩阵可增加LSTM模型从基因序列数据中可提取的特征维度和种类,以提高LSTM模型对基因序列的处理性能。通过交叉验证实验表明本方法在公共数据集上的准确率可达93.7%,敏感度、特异性和马氏相关系数分别为93.0%、94.5%、0.875,均优于现有方法。进一步,在其他6个不同物种的基因数据集上,受试者工作特征曲线线下面积(Area Under the Curve,AUC)值介于0.9055~0.9262,表明本方法可适用于动物、植物和微生物的甲基化位点预测。本方法对水稻NC_029258.1基因序列进行全碱基位点的预测,经4种不同的在线工具校验,本方法预测出的86%~96%的潜在甲基化位点在其他工具中也获得相似结论,预测结论可靠,可应用于基因序列甲基化位点的预测分析工作。DNA-N6 methyladenine(6-mA)methylation modification is one of the most important epigenetic modification markers.The aberrant 6-mA modification can affect gene expression and lead to serious diseases.Therefore,the work of predicting the 6-mA site is of great significance for the understanding of the pathogenesis and treatment of diseases.In this paper,a long short-term memory(LSTM)neural network based on K-mer encoding method and one hot encoding method is proposed to predict methylation sites.Firstly,the information content of gene sequence is increased through K-mer coding method.Secondly,the information content after one hot encoding is converted into a composite encoding matrix.The LSTM model can extract more feature dimensions and types from the encoding matrix,to improve the prediction performance of the LSTM model for gene sequence.The cross validation experiment show that the proposed method can achieve accuracy of 93.7%on benchmark datasets.The sensitivity,specificity and matthews correlation coefficient of the trained model were 93.0%,94.5%and 0.875,which outperformed existing 6-mA prediction methods.On the other six different species datasets,the proposed method can achieve the area under the curve(AUC)values from 0.9055 to 0.9262,which shows the applicability of the proposed method on animals,plants and microorganisms methylation tasks.The proposed method was applied on rice gene NC_029258.1,and the predictions were verified by the recently published online prediction tools.The results show that 86%to 96%of the prediction results are supported by these tools,indicating that the proposed method can be applied to large-scale site prediction and analysis of different species.
关 键 词:甲基化位点预测 深度学习 长短时记忆网络 复合特征
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222