检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:常征 孟军[1] 施云生 莫冯然 CHANG Zheng;MENG Jun;SHI Yunsheng;MO Fengran(School of Computer Science and Technology,Dalian University of Technology,Dalian 116023,China)
机构地区:[1]大连理工大学计算机科学与技术学院,辽宁大连116023
出 处:《智能系统学报》2018年第6期928-934,共7页CAAI Transactions on Intelligent Systems
基 金:国家自然科学基金项目(61472061);大连理工大学研究生教改基金项目(Jg2017015);大连理工大学大学生创新训练项目(2018101410201011019)
摘 要:针对传统的基于单一特征的植物lncRNA识别的局限性,提出了融合RNA序列的开放阅读框、二级结构以及k-mers等多特征方法,训练高斯朴素贝叶斯、支持向量机和梯度提升决策树3种经典的分类模型,并实现分类结果的集成,利用交叉验证对模型的性能进行了评估,整体性能优于目前较流行的CPAT、CNCI和PLEK预测软件,在拟南芥数据集上总体的准确率达到了89%。另外,基于内源性竞争规则以及RNA结构信息,分别对lncRNA-microRNA和microRNA-mRNA进行靶向预测、筛选,再通过整合预测数据建立互作网络,并对网络模块中的lncRNA进行功能预测。通过GO术语分析,对与mRNA相关的lncRNA可能参与的生物调控过程进行预测,推测它们的相应功能。Considering the limitations of the traditional plant lncRNA identification based on a single feature,in this paper,a method,in which the open reading frame,secondary structure,and k-mers features of RNA sequences are integrated,is proposed.It involves the training of three classical classification models,Gaussian naive Bayes,support vector machines,and gradient lifting decision tree,and integrating the classification results.The performance of the method was evaluated using cross-validation,and it exhibited superior performance.The accuracy of the proposed method reached 89%when tested with the Arabidopsis thaliana dataset.Using the same dataset,the proposed method outperformed the popular CPAT,CNCI,and PLEK prediction software.In addition,based on the endogenous competition rules and RNA structure information,target prediction and filter rules for lncRNA-microRNA and microRNA-mRNA pairs were executed,and then related tools were used to establish RNA interaction regulatory networks,and the regulatory relationship was analyzed to predict the functions of lncRNAs in modules.Through Gene Ontology term analysis,the possible biological regulation function of lncRNAs can be predicted,and their corresponding functions can be inferred.
关 键 词:lncRNA 识别 特征提取 多特征融合 机器学习 互作关系 网络构建 功能预测
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.67.245