检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:钟来民 陆卫忠 傅启明[1] 马洁明 崔志明 吴宏杰[1] ZHONG Laimin;LU Weizhong;FU Qiming;MA Jieming;CUI Zhiming;WU Hongjie(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou 215009,China;School of Intelligent Engineering,Xijiao Liverpool University,Suzhou 215123,China)
机构地区:[1]苏州科技大学电子与信息工程学院,江苏苏州215009 [2]西交利物浦大学智能工程学院,江苏苏州215123
出 处:《微电子学与计算机》2023年第12期1-9,共9页Microelectronics & Computer
基 金:国家自然科学基金(62372318,62073231,62176175)。
摘 要:蛋白质与生命活动密切相关,脱氧核糖核酸(DNA)结合蛋白作为一种特殊的蛋白质,在生命活动中有着不可替代的作用.因此,研究DNA结合蛋白有很重要的现实意义,这个课题的研究前景十分广阔.传统生物技术虽然精度较高,但其成本十分的昂贵,效率比较低,设备要求极高,并不适合现代社会大量研究蛋白质的需求.机器学习的方法在一定程度上弥补了生物实验技术的不足,但是在数据处理方面远不如深度学习技术来的高效与便捷.在本研究中提出了一种基于双向平行长短期记忆神经网络(BiLSTM)和Transformer的深度学习框架来预测DNA结合蛋白.该模型不仅可以进一步提取蛋白质序列的信息和特征,还可以进一步提取进化信息的特征,最后,将这两个特征融合起来进行训练和测试.该模型拓展了研究人员在蛋白质特征提取方面的研究思路,为使用Transformer编码器块提取蛋白质全局特征提供参考.在PDB2272数据集上,与PDBP_Fusion模型相比,精度(ACC)和Matthew相关系数(MCC)分别提高了2.64%和5.51%.该模型的实验结果具有一定的优势.Protein is closely related to life activities.As a special protein,DeoxyriboNucleic Acid(DNA)binding protein plays an irreplaceable role in life activities.Therefore,the study of DNA binding protein has very important practical significance,and the research prospect of this subject is very broad.Although the traditional biotechnology has high precision,its cost is very expensive,relatively low efficiency and high equipment requirements,so it is not suitable for the modern society to study a large number of proteins.To some extent,machine learning makes up for the shortcomings of biological experiment technology,but it is far less efficient and convenient than deep learning technology in data processing.In this study,a deep learning framework based on Bidirectional parallel Long Term and Short Term Memory neural network(BiLSTM)and Transformer is proposed to identify DNA binding proteins.The model can not only further extract the information and characteristics of protein sequences,but also further extract the characteristics of evolutionary information.Finally,the two features are integrated for training and testing.This model expands the research ideas of researchers in protein feature extraction,and provides a reference for extracting global protein features with Transformer encoder blocks.On the PDB2272 dataset,the accuracy(ACC)and Matthew Correlation Coefficient(MCC)improved by 2.64%and 5.51%,respectively,compared to the PDBP_Fusion model.The experimental results of this model have certain advantages.
关 键 词:TRANSFORMER 双向长短期记忆网络 DNA结合蛋白 特征提取 深度学习
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7