基于序列特征的环状RNA识别  被引量:2

Identification of circular RNAs using genomic sequence features

在线阅读下载全文

作  者:周晶 谢雪英[2,3] 顾万君[2,3] ZHOU Jing1, XIE Xueying 2, 3, GU Wanjun 2, 3(1. Research Center for Learning Sciences, Southeast University, Nanjing 210096, China; 2. State Key Laboratory of Bio-electronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, China; 3. National Demonstration Center for Experimental Biomedical Engineering Education (Southeast University), Nanjing 210096, China)

机构地区:[1]东南大学学习科学研究中心,南京210096 [2]生物电子学国家重点实验室东南大学生物科学与医学工程学院,南京210096 [3]生物医学工程国家级实验教学示范中心(东南大学),南京210096

出  处:《生物信息学》2018年第2期113-118,共6页Chinese Journal of Bioinformatics

基  金:国家自然科学基金(61372164;61471112;61571109);江苏省重点研发计划(BE2016002-3);中央高校基本科研业务费专项资金(2242017K3DN04)

摘  要:环状RNA是新发现的一类具有重要生物学功能的RNA。现有的环状RNA识别工具依赖高通量测序数据,因数据本身和识别方式的弊端而普遍存在准确性不足、不同方法间重复性低以及假阳性率/假阴性率高等缺点。为了解决该问题,我们搭建模型来实现不依赖于测序数据而根据序列的内在特征的环状RNA从头预测。本文选取了包括剪接位点上下游内含子的长度、A-to-I密度和Alu重复序列等100个与RNA成环相关的序列特征,建立了机器学习模型,并识别了人类基因组中的环状RNA,比较了两种机器学习方法随机森林法(RF)和支持向量机(SVM)的分类效果。结果表明,所选序列特征能有效地鉴别RNA能否成环,同时,不同序列特征对模型的分类预测能力的贡献也不同。相比于SVM方法,RF分类的效果更好。Circular RNAs(circRNAs) are a class of novel RNAs with important biological functions. Currently,the identification tools of circRNAs are dependent on high-throughput sequencing. However,due to defects in data and their identification mode,low accuracy,low overlapping rate of different methods,high false positive rate,and false negative rate generally exist. To solve this problem,we built a model to identify circRNAs from the very beginning based on the inherent features of the genomic sequence rather than sequencing data. We selected 100 genomic sequence features related to circRNAs including the length of flanking introns,the density of A-to-I RNA editing sites,and the pairing score of Alu elements in the flanking introns,built machine learning model,identified the circRNAs in human genome,compared the classifying results of two machine learning algorithms,random forest(RF) and support vector machine(SVM). The results showed that the selected features could effectively identify circRNAs and different sequence features had different contributions to the identification of circRNAs. In addition,RF model had a better performance than SVM model in identifying RNAs.

关 键 词:环状RNA 序列特征 机器学习 随机森林 支持向量机 

分 类 号:Q522.6[生物学—生物化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象