检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:谢冬梅 边昕烨 于连飞 刘文博 王子灵 曲志坚[1] 于家峰 XIE Dongmei;BIAN Xinye;YU Lianfei;LIU Wenbo;WANG Ziling;QU Zhijian;YU Jiafeng(School of Computer Science and Technology,Shandong University of Technology,Zibo Shandong 255049,China;Institute of Biophysics,Dezhou University(Shandong Key Laboratory of Biophysics),Dezhou Shandong 253023,China)
机构地区:[1]山东理工大学计算机科学与技术学院,山东淄博255049 [2]德州学院生物物理研究院(山东省生物物理重点实验室),山东德州253023
出 处:《计算机应用》2025年第2期546-555,共10页journal of Computer Applications
基 金:山东省高等学校青年创新团队发展计划项目(2019KJN048)。
摘 要:小开放阅读框(sORFs)在多种生物学过程中发挥着关键作用,且准确识别编码sORFs和非编码sORFs是基因组学中一项重要且有挑战性的任务。针对目前大多数编码sORFs预测算法严重依赖基于先验生物知识的手工特征且缺乏通用性的问题以及原始sORFs的序列长度长短不一而无法直接输入预测模型的问题,提出一种基于sORF-Graph图编码方式的端到端的深度学习框架DeepsORF预测编码sORFs。首先,通过sORF-Graph将所有sORFs序列编码成对应的图,并将序列信息编码成图元素特征,从而对输入序列进行标准化处理;其次,引入基于卷积与残差的流注意力机制捕获sORFs中碱基远距离之间的相互作用,以更有效地表达sORFs的特征,并提高模型的预测精度。实验结果证明,DeepsORF框架在6个独立测试集上的性能均得到提升,与csORF-finder方法相比,DeepsORF在D.melanogaster nonCDS-sORFs测试集上的准确率、马修斯相关系数(MCC)以及精确率分别提升了9.97、19.49与13.07个百分点,验证了DeepsORF模型在识别编码sORFs和非编码sORFs任务中的有效性以及良好泛化能力。Small Open Reading Frames(sORFs) plays a critical role in various biological processes,and identifying coding and non-coding sORFs accurately is a significant and challenging task in genomics.Due to the severe reliance of most existing algorithms for predicting coding sORFs on manual features based on prior biological knowledge,and the lack of universality of the algorithms,as well as the variable lengths of original sORFs sequences that prevent direct input into prediction models,an sORF-Graph graph encoding method-based end-to-end deep learning framework,DeepsORF,was developed for predicting coding sORFs.Firstly,all sORFs sequences were encoded into the corresponding graphs through sORF-Graph,and the input sequences were standardized by encoding sequence information into graph element features.Then,a convolutional and residual flow attention mechanism was introduced to capture the interactions among long distant bases within sORFs,thereby enhancing the expression of sORFs features and improving the model's prediction accuracy.Experimental results demonstrate that DeepsORF framework enhances performance on all of six independent test sets.Compared with csORF-finder method,DeepsORF achieves increases of 9.97,19.49,and 13.07 percentage points in accuracy,Matthew Correlation Coefficient(MCC),and precision,respectively,on D.melanogaster nonCDS-sORFs test set,validating the effectiveness and good generalization ability of DeepsORF model in the task of identifying coding and noncoding sORFs.
关 键 词:小开放阅读框 编码sORFs 端到端 图编码 流注意力
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.17.139.45