基于机器学习方法预测环形RNA编码蛋白的潜能  

Prediction of circRNA protein-coding potential based on a machine-learning method

在线阅读下载全文

作  者:王琮 赵健 刘晶晶 宋晓峰[1] WANG Cong;ZHAO Jian;LIU Jing-jing;SONG Xiao-feng(College of Automation Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 210006,China)

机构地区:[1]南京航空航天大学自动化学院,中国南京211106

出  处:《云南民族大学学报(自然科学版)》2020年第5期464-471,485,共9页Journal of Yunnan Minzu University:Natural Sciences Edition

基  金:国家自然科学基金(61973155;61571223)。

摘  要:环形RNA是一类广泛存在于真核细胞中的内源性RNA分子,没有5’末端帽子和3’末端polyA尾巴,以共价键连接形成封闭环状结构.一直以来,环形RNA被认为是一类不能翻译的非编码RNA.然而近年来有研究报道,环形RNA能够编码蛋白质从而调控重要的生命活动,引起了研究者们的注意.运用机器学习的方法,基于环形RNA的序列与结构特征,使用XGBoost、随机森林和支持向量机组合而成的综合分类模型,预测环形RNA编码蛋白的潜能,平均预测准确率达到86.66%,为实验研究人员提供可靠的参考,有助于发现更多可编码蛋白的环形RNA.Circular RNA(circRNA)is a type of endogenous RNA without 5’cap and 3’polyA tail.CircRNA is covalently linked to form a closed loop structure and is widely present in eukaryotic cells.For a long time,circRNA has been known as a type of non-coding RNA.However,in recent years,it has been reported that circRNA can encode proteins and regulate important biological activities.Researchers have started to explore the circRNA encoding potentiality.In this study,based on the sequence and structural characteristics of the circRNA,using machine-learning methods,a comprehensive classification model composed of XGBoost,random forest and support vector machine is used to predict the protein-coding circRNA.The average prediction accuracy rate is 86.66%.This model provides a reliable reference for biological researchers in finding and providing more protein-coding circRNAs.

关 键 词:环形RNA 编码蛋白 机器学习 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] Q74[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象