一种基于随机森林的长非编码RNA预测方法被引量：2

A long non-coding RNA prediction method based on random forest

出　　处：《扬州大学学报（自然科学版）》2016年第4期50-53,共4页Journal of Yangzhou University：Natural Science Edition

基　　金：国家自然科学基金资助项目(61301220);江苏省"六大人才高峰"第七批高层次人才项目(2010-DZXX-149)

摘　　要：为了提高长非编码RNA(long non-coding RNA,lncRNA)预测的准确性,提出一种基于随机森林算法的lncRNA预测方法.在国际通用的基因注释和基因组序列训练数据集中,首先进行特征选取,然后采用随机森林算法对包含特征信息的数据集进行模型训练.选取的特征包含14种三聚核酸序列(ACG、CCG、CGA、CGC、CGG、CGT、CTA、GCG、GGG、GTA、TAA、TAC、TAG、TCG)的占比、终止密码子在3种阅读框中的数量标准差、GC含量、蛋白质编码能力、转录本长度、外显子个数、平均外显子长度和保守性分值.10折交叉验证结果表明,该预测方法在真阳性率、精确率、召回率、F值和AUC值等性能指标方面均优于其他算法.To improve the accuracy of long non-coding RNA （lncRNA） prediction, a method based on random forest is proposed. Dataset for model training is derived from worldwide generally used gene annotation and genome sequence. Features selected include ratios of 14 triple-nucleotide sequences （ACG, CCG, CGA, CGC, CGG, CGT, CTA, GCG, GGG, GTA, TAA, TAC, TAG, TCG） to the transcript length respectively, standard deviations of stop codon counts of three read- ing frames, GC content, protein-coding potential （CDS, CDS length and ratio of CDS to tran- script）, transcript length, exon count, average exon length, conservation score （average PhastCons score of transcript）. Then the random forest algorithm is applied to the dataset for model training, and the over-fitting problem is solved during the realization of other algorithms. Results of 10-fold cross-validation manifest that the lncRNA prediction method based on random forest performs better than other methods including K-nearest neighbors （K-NN）, Naive Bayes and Bayesian net- work in terms of true positive rate, precision, recall, F score and AUC （area under curve）.

关键词：长非编码RNA 随机森林基因预测

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于随机森林的长非编码RNA预测方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于随机森林的长非编码RNA预测方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于随机森林的长非编码RNA预测方法被引量：2