一种自适应序列长度的RNA二级结构深度预测方法  

Adaptive Sequence Length Deep Method for Predicting RNA Secondary Structure

在线阅读下载全文

作  者:吴宏杰[1] 汤烨 陆卫忠 崔志明 付保川[1] GAO Zhen[1,3] WU Hong-jie;TANG Ye;LU Wei-zhong;CUI Zhi-ming;FU Bao-chuan;GAO Zhen(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou 215009,China;Institute of Intelligent Information Processing and Application,Soochow University,Suzhou 215006,China;School of Engineering Technology, McMaster University, Hamilton, Ontario, Canada ,45011)

机构地区:[1]苏州科技大学电子与信息工程学院,江苏苏州215009 [2]苏州大学智能信息处理及应用研究所,江苏苏州215006 [3]School of Engineering Technology McMaster University,Hamilton Ontario Canada 45011

出  处:《小型微型计算机系统》2019年第8期1799-1803,共5页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61772357,61876217,61672371,61502329)资助;江苏省333人才项目资助;江苏省六大人才高峰项目(DZXX-010)资助;苏州市科技项目(SYG201704,SNG201610)资助

摘  要:RNA二级结构预测是结构生物信息学中的一个重要问题.带假结的RNA二级结构预测,由于复杂的假结结构,更是增加了预测的难度.传统的机器学习方法受限于学习模型的结构,输入特征数目必须固定.大部分方法将不同长度的序列统一截断后进行训练,这不仅导致有用信息丢失,而且并破坏了生物序列完整性.针对该问题提出了一种适应序列长度的深度递归神经网络模型,构造了序列长度自适应模块及训练算法,从而不需要截断.同时,由于实际样本比例不均衡,采用了动态加权方法进行改善.随后,在权威数据集RNA STRAND上与四种优秀方法进行了四组比较实验.实验结果表明,本方法的正确率和M atthew s相关系数比定长LSTM方法分别提高了1. 6%和3. 3%;比其它四种典型方法提高了13. 6%和14. 8%.RNA secondary structure prediction is an important issue in structural bioinformatics. The difficulty of RNA secondary structure prediction with pseudoknot is increased due to complicated structure of the pseudoknot. Traditional machine learning methods are restricted by the topologies of the models. The fixed shape of features make their input sequences truncated before training. It not only leads to the loss of valuable information but also destroys the integrity of biological sequence. To address this issue,an adaptive LSTM deep model which could automatically fit in with variation of sequence length was proposed,adaptive module and a new training algorithm was constructed. And dynamic weighting method is used to resolve the imbalance sample quantity. Subsequently,three comparative experiments were conducted with four excellent methods on the classical data set RNA STRAND. The experimental results showed that the accuracy and Matthews correlation coefficient of the method are 1. 6% and 3. 3% higher than the fixed length LSTM respectively,and higher than other four methods by 13. 6% and 14. 8% respectively.

关 键 词:RNA二级结构预测 递归神经网络 动态加权 假结 碱基 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象