大肠杆菌启动子启动强度的预测  

The prediction of strength of Escherichia coli promoters

在线阅读下载全文

作  者:周漩[1] 周欣[1] 钟兆健[1] 

机构地区:[1]广东药学院药科学院,广东广州510006

出  处:《计算机与应用化学》2014年第1期101-104,共4页Computers and Applied Chemistry

摘  要:启动子是DNA序列中的关键元件,直接影响生物的转录与表达,启动子的研究对转录机制的阐明以及整个基因组功能的注释都具有重要作用。然而,用实验方法对启动子进行检测费时费力,发展启动子预测的方法具有十分重要的意义。本文基于离散小波变换建立伪三碱基组成表征DNA序列,支持向量机建模,预测大肠杆菌启动子的启动强度。首先采用二维映射法对DNA序列进行映射,得到二维离散的数字序列,并将之合并为一维数字序列;采用离散小波变换对数字映射序列进行转换,将得到的小波变换结果与三碱基组成结合构建伪三碱基组成,离散小波变换中小波函数与小波分解尺度的优化通过5-折交叉验证选取;构建得到的伪三碱基组成作为支持向量机的输入参数,建模进行预测。训练集得到的预测相关系数R为0.9830,RMSE为0.0907;测试集得到的预测相关系数R为0.8606,RMSE为0.1014。结果表明,模型的预测效果良好,说明基于离散小波变换的伪三碱基组成能够有效地反映DNA序列中碱基的顺序信息,本文方法不仅能够有效地实现大肠杆菌启动子启动强度的预测,也为DNA其他生物功能的预测提供了参考。Promoters, which are the key component of DNA sequence, are responsible for the initiation of transcription and expression. The research of promoters will be useful in elucidating regulation and expression mechanism of genes. However, owing to the availability of vast amounts of genomic data, it was expensive and time-consuming to detect promoters experimentally and manually. Consequently, there was a need for developing prediction techniques that can rapidly and accurately evaluate sequences for the presence of promoters. In this paper, the pseudo-trinucleotide composition based on discrete wavelet transform was proposed to represent DNA sequence, and the pseudo-trinucleotide composition was employed to model support vector machines for the prediction of strength of Escherichia coli promoters. Initially, a two-dimensional DNA walk method was applied and the DNA sequences were converted into a two vectors of digital sequences, which were merged into a one vector of digital sequence. The achieved digital sequence was transformed by the discrete wavelet transform and the transformed digitals were combined with the trinucleotide composition to construct the pseudo- trinucleotide composition. The function and the decomposition scale of wavelet were optimized by 5-fold cross validation. The support vector machine was applied with the pseudo-trinucleotide composition as input parameters for the prediction modeling. The correlation coefficient R and root-mean-square deviation RAISE for training set was 0.9830 and 0.0907 respectively, and the R and RMSE for test set was 0.8606 and 0.1014 respectively. The good prediction results revealed that the proposed method was an effective method for promoter prediction. It can be anticipated that the novel DNA sequence representation method may hold a high potential to become a useful tool for predicting other DNA functions.

关 键 词:离散小波变换 伪三碱基组成 支持向量机 大肠杆菌启动子 

分 类 号:TQ015.9[化学工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象