基于序列柔性参数的大肠杆菌启动子的预测  被引量:1

Prediction of Escherichia Coli Promoter Based on Sequence Flexibility Parameters

在线阅读下载全文

作  者:谢亚茹 董志飞 杨佳赫 周小军 李前忠 XIE Ya-ru;DONG Zhi-fei;YANG Jia-he;ZH OU Xiao-j un;LI Qian-zhong(School of Physical Science and Technology,Inner Mongolia University,Hohhot 010021,China)

机构地区:[1]内蒙古大学物理科学与技术学院电子科学技术系,呼和浩特010021

出  处:《内蒙古大学学报(自然科学版)》2018年第6期620-628,共9页Journal of Inner Mongolia University:Natural Science Edition

基  金:国家自然科学基金资助项目(Nos:31460234;61361015;11647310);内蒙古大学大学生创新创业项目基金(Nos:201610126027)

摘  要:以大肠杆菌K-12的启动子(Promoter)序列作为正集,编码区(Coding区)和基因间汇聚区(CON区)的序列作为两组对照负集建立数据库,分别对实验上确定的165条σ38、94条σ54及600条σ70启动子序列进行二联体位点的保守性分析,得到保守性参数随位点的涨落.用单种柔性结构参数分别和各集的二联体位置关联权重矩阵构成对序列的打分函数,分别对三类数据集进行了检验.研究发现,在自洽检验中,算法对σ38、σ54启动子的预测准确性都达到98%,对σ70启动子的预测准确性也达到了88%.通过绘制ROC曲线,确定了三类数据集的十交叉检验的最佳阈值,该算法在最佳阈值下对σ38、σ70两类启动子的预测准确度(Ac)都达到了80%以上,其中算法对以σ70启动子序列为正集、编码区序列为负集的数据集(记为Prom-Coding数据集)的Ac为88%,而对以σ54启动子序列为正集、基因间汇聚区序列为负集的数据集(记为Prom-CON数据集)的Ac为76%、Prom-Coding数据集的Ac为87%.使用Jackknife法对σ38、σ54两类数据集进行检验,Ac都达到了80%以上.The promoter sequences of E. coli K-12 promoter were selected as the positive set, while the coding sequences and the convergence sequences were selected as the negative sets to establish a database. The conservation of the 165 σ^38 ,the 94 σ^54 and the 600 σ^70 promoter sequences,which were determined experimentally, were analyzed. Combining the neighbour position-correlation weight matrix with a single flexi- ble structural parameter as the algorithm, three types of promoter sequences were respectively predicted. The study found that the accuracy of the algorithm for the 038 and os4 promoters reached 98% ,while the prediction accuracy for the OTM data set reached 88 % in the self-consistent test. By drawing the receiver oper- ating characteristic (ROC) curve,the optimal thresholds in the ten cross test were determined for three types of data sets. The prediction accuracies for the 038 and aT0 promoters were more than 80% under the optimal threshold. Especially the Ac achieved 88 % and 87 %, respectively, for the Prom-Coding data set of the σ^70, and σ^54 promoter. The Ac was 76 % for the Prom-CON dataset of the 054 promoter. The accuracies of the algorithm are more than 80% in jackknife test,respectively,for two types of promoters,σ^38 and σ^s4. Key words: Escherichia coli K-12; neighbour position-correlation weight matrixconservative

关 键 词:大肠杆菌K-12 二联体位置关联权重矩阵 保守性参数 柔性参数 ROC曲线 

分 类 号:Q61[生物学—生物物理学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象