结合多尺度卷积胶囊网络的植物lncRNA编码小肽预测  

Prediction of Plant lncRNA-encoded Small Peptides Combined with Multi-scale Convolutional Capsule Network

在线阅读下载全文

作  者:胡鹤还 孟军[1] 赵思远 纪腾其 HU Hehuan;MENG Jun;ZHAO Siyuan;JI Tengqi(School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, China)

机构地区:[1]大连理工大学计算机科学与技术学院,辽宁大连116023

出  处:《郑州大学学报(理学版)》2022年第1期12-18,共7页Journal of Zhengzhou University:Natural Science Edition

基  金:国家自然科学基金项目(61872055)。

摘  要:长非编码RNA(lncRNA)是一类不编码蛋白、长度大于200 nt的非编码RNA。然而,最近研究表明,部分lncRNA中含有不超过300 nt的短开放阅读框(sORFs),具备编码小肽的能力。这一发现使得sORFs编码小肽(SEPs)这一崭新的研究领域引起人们的重视。目前,对SEPs的研究大多采用生物实验和传统机器学习方法。由于生物实验方法造价高、耗时长、传统机器学习涉及过多人工干预,提出一种结合多尺度卷积胶囊网络的深度学习模型,既能够充分提取序列特征,又通过胶囊间的连接进行特征聚类。采用五折交叉验证评估模型性能,在苔藓数据集上与单一深度学习模型和简单融合深度学习模型相比,取得较好的分类效果。另外,采用拟南芥、大豆两个物种的数据集进行独立测试,验证了模型具有良好的泛化能力。Long non-coding RNA(lncRNA)is a type of non-coding RNA with a length of 200 nt that has no ability to code for protein.However,it has been shown that some lncRNAs contain short open reading frames(sORFs)of no more than 300 nt,which have the ability to encode small peptides.This discovery has made the new research field of sORFs-encoding peptides(SEPs)arouse people′s attention.Biological experiments and traditional machine learning methods were mostly used by the researchers of SEPs.Due to the high cost and time-consuming of biological experiment methods,and the traditional machine learning methods involving too many manual interventions,a deep learning model combined with multi-scale convolutional capsule networks was proposed.It could not only fully extract sequence features,but also cluster features through the connection between capsules.Compared with the single deep learning model and the simple fusion deep learning model,the performance of the proposed model was better,which was evaluated by 5-fold cross validation on the datasets of Physcomitrella patens.In addition,the datasets of Arabidopsis thaliana and Glycine max were used to test the model independently,which verified the good generalization ability of the model.

关 键 词:胶囊网络 长非编码RNA 短开放阅读框 小肽 预测 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象