检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:彭宝成 张晓炜[2] 刘暘 樊国梁[1] PENG Bao-Cheng;ZHANG Xiao-Wei;LIU Yang;Fan Guo-Liang(School of Physical Science and Technology,Inner Mongolia University,Hohhot 010021,China;Department of Rheumatology,the First Affiliated Hospital,Inner Mongolia Medical University,Hohhot 010050,China)
机构地区:[1]内蒙古大学物理科学与技术学院,呼和浩特010021 [2]内蒙古医科大学第一附属医院风湿免疫科,呼和浩特010050
出 处:《生物化学与生物物理进展》2022年第7期1334-1347,共14页Progress In Biochemistry and Biophysics
基 金:国家自然科学基金(62063024);内蒙古自治区高等学校科学研究项目(NJZY20005);内蒙古大学大学生创新创业训练计划项目(201912240)资助。
摘 要:目的基于位点特异性打分矩阵(position-specific scoring matrices,PSSM)的预测模型已经取得了良好的效果,基于PSSM的各种优化方法也在不断发展,但准确率相对较低,为了进一步提高预测准确率,本文基于卷积神经网络(convolutional neural networks,CNN)算法做了进一步研究。方法采用PSSM将启动子序列处理成数值矩阵,通过CNN算法进行分类。大肠杆菌K-12(Escherichia coli K-12,E.coli K-12,下文简称大肠杆菌)的Sigma38、Sigma54和Sigma703种启动子序列被作为正集,编码(Coding)区和非编码(Non-coding)区的序列为负集。结果在预测大肠杆菌启动子的二分类中,准确率达到99%,启动子预测的成功率接近100%;在对Sigma38、Sigma54、Sigma703种启动子的三分类中,预测准确率为98%,并且针对每一种序列的预测准确率均可以达到98%以上。最后,本文以Sigma38、Sigma54、Sigma703种启动子分别和Coding区或者Non-coding区序列做四分类,预测得到的准确性为0.98,对3种Sigma启动子均衡样本的十交叉检验预测精度均可以达到0.95以上,海明距离为0.016,Kappa系数为0.97。结论相较于支持向量机(support vector machine,SVM)等其他分类算法,CNN分类算法更具优势,并且基于CNN的分类优势,编码方式亦可以得到简化。Objective The prediction model based on PSSM(position-specific scoring matrix) has achieved good results, and various optimization methods based on PSSM are also being continuously developed. However,the accuracy rate is relatively lower. In order to further improve the prediction accuracy rate, this paper does further research based on the CNN algorithm. Methods In this paper, PSSM is used to process the letter sequence into a numeric matrix, and through a convolutional neural network(CNN) algorithm for classification.The 3 promoter sequences of Sigma38, Sigma54 and Sigma70 of E.coli K-12(Escherichia coli K-12, hereinafter referred to as Escherichia coli) are used as the positive sets, and the sequences of the Coding and Non-coding regions of Escherichia coli are the negative set. Results In the prediction of Escherichia coli for the twoclassification for promoters, the accuracy rate reaches 99%, and the success rate of promoter prediction is close to100%;in the three-classification for Sigma38, Sigma54 and Sigma70 promoters, the prediction accuracy rate is98%, and for each the prediction accuracy of these sequences can reach 0.98 or more. Finally, we tried 4classifications of 3 promoters of Sigma38, Sigma54 and Sigma70 with Coding area or Non-coding area sequences respectively, the accuracy of prediction was 0.98. The prediction accuracy of the ten-fold cross-validation of the balanced samples of the Sigma promoters can reach more than 0.95, the Hamming distance is 0.016, and the Kappa coefficient is 0.97. Conclusion Compared with other classification algorithms such as SVM(support vector machine), the CNN classification algorithm has more advantages, and based on the classification advantages of CNN, the coding method can also be simplified.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.180.18