多特征融合的lncRNA识别与其功能预测  被引量:5

LncRNA recognition by fusing multiple features and its function prediction

在线阅读下载全文

作  者:常征 孟军[1] 施云生 莫冯然 CHANG Zheng;MENG Jun;SHI Yunsheng;MO Fengran(School of Computer Science and Technology,Dalian University of Technology,Dalian 116023,China)

机构地区:[1]大连理工大学计算机科学与技术学院,辽宁大连116023

出  处:《智能系统学报》2018年第6期928-934,共7页CAAI Transactions on Intelligent Systems

基  金:国家自然科学基金项目(61472061);大连理工大学研究生教改基金项目(Jg2017015);大连理工大学大学生创新训练项目(2018101410201011019)

摘  要:针对传统的基于单一特征的植物lncRNA识别的局限性,提出了融合RNA序列的开放阅读框、二级结构以及k-mers等多特征方法,训练高斯朴素贝叶斯、支持向量机和梯度提升决策树3种经典的分类模型,并实现分类结果的集成,利用交叉验证对模型的性能进行了评估,整体性能优于目前较流行的CPAT、CNCI和PLEK预测软件,在拟南芥数据集上总体的准确率达到了89%。另外,基于内源性竞争规则以及RNA结构信息,分别对lncRNA-microRNA和microRNA-mRNA进行靶向预测、筛选,再通过整合预测数据建立互作网络,并对网络模块中的lncRNA进行功能预测。通过GO术语分析,对与mRNA相关的lncRNA可能参与的生物调控过程进行预测,推测它们的相应功能。Considering the limitations of the traditional plant lncRNA identification based on a single feature,in this paper,a method,in which the open reading frame,secondary structure,and k-mers features of RNA sequences are integrated,is proposed.It involves the training of three classical classification models,Gaussian naive Bayes,support vector machines,and gradient lifting decision tree,and integrating the classification results.The performance of the method was evaluated using cross-validation,and it exhibited superior performance.The accuracy of the proposed method reached 89%when tested with the Arabidopsis thaliana dataset.Using the same dataset,the proposed method outperformed the popular CPAT,CNCI,and PLEK prediction software.In addition,based on the endogenous competition rules and RNA structure information,target prediction and filter rules for lncRNA-microRNA and microRNA-mRNA pairs were executed,and then related tools were used to establish RNA interaction regulatory networks,and the regulatory relationship was analyzed to predict the functions of lncRNAs in modules.Through Gene Ontology term analysis,the possible biological regulation function of lncRNAs can be predicted,and their corresponding functions can be inferred.

关 键 词:lncRNA 识别 特征提取 多特征融合 机器学习 互作关系 网络构建 功能预测 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象