检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周晶 谢雪英[2,3] 顾万君[2,3] ZHOU Jing1, XIE Xueying 2, 3, GU Wanjun 2, 3(1. Research Center for Learning Sciences, Southeast University, Nanjing 210096, China; 2. State Key Laboratory of Bio-electronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, China; 3. National Demonstration Center for Experimental Biomedical Engineering Education (Southeast University), Nanjing 210096, China)
机构地区:[1]东南大学学习科学研究中心,南京210096 [2]生物电子学国家重点实验室东南大学生物科学与医学工程学院,南京210096 [3]生物医学工程国家级实验教学示范中心(东南大学),南京210096
出 处:《生物信息学》2018年第2期113-118,共6页Chinese Journal of Bioinformatics
基 金:国家自然科学基金(61372164;61471112;61571109);江苏省重点研发计划(BE2016002-3);中央高校基本科研业务费专项资金(2242017K3DN04)
摘 要:环状RNA是新发现的一类具有重要生物学功能的RNA。现有的环状RNA识别工具依赖高通量测序数据,因数据本身和识别方式的弊端而普遍存在准确性不足、不同方法间重复性低以及假阳性率/假阴性率高等缺点。为了解决该问题,我们搭建模型来实现不依赖于测序数据而根据序列的内在特征的环状RNA从头预测。本文选取了包括剪接位点上下游内含子的长度、A-to-I密度和Alu重复序列等100个与RNA成环相关的序列特征,建立了机器学习模型,并识别了人类基因组中的环状RNA,比较了两种机器学习方法随机森林法(RF)和支持向量机(SVM)的分类效果。结果表明,所选序列特征能有效地鉴别RNA能否成环,同时,不同序列特征对模型的分类预测能力的贡献也不同。相比于SVM方法,RF分类的效果更好。Circular RNAs(circRNAs) are a class of novel RNAs with important biological functions. Currently,the identification tools of circRNAs are dependent on high-throughput sequencing. However,due to defects in data and their identification mode,low accuracy,low overlapping rate of different methods,high false positive rate,and false negative rate generally exist. To solve this problem,we built a model to identify circRNAs from the very beginning based on the inherent features of the genomic sequence rather than sequencing data. We selected 100 genomic sequence features related to circRNAs including the length of flanking introns,the density of A-to-I RNA editing sites,and the pairing score of Alu elements in the flanking introns,built machine learning model,identified the circRNAs in human genome,compared the classifying results of two machine learning algorithms,random forest(RF) and support vector machine(SVM). The results showed that the selected features could effectively identify circRNAs and different sequence features had different contributions to the identification of circRNAs. In addition,RF model had a better performance than SVM model in identifying RNAs.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229