检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机与应用化学》2014年第1期101-104,共4页Computers and Applied Chemistry
摘 要:启动子是DNA序列中的关键元件,直接影响生物的转录与表达,启动子的研究对转录机制的阐明以及整个基因组功能的注释都具有重要作用。然而,用实验方法对启动子进行检测费时费力,发展启动子预测的方法具有十分重要的意义。本文基于离散小波变换建立伪三碱基组成表征DNA序列,支持向量机建模,预测大肠杆菌启动子的启动强度。首先采用二维映射法对DNA序列进行映射,得到二维离散的数字序列,并将之合并为一维数字序列;采用离散小波变换对数字映射序列进行转换,将得到的小波变换结果与三碱基组成结合构建伪三碱基组成,离散小波变换中小波函数与小波分解尺度的优化通过5-折交叉验证选取;构建得到的伪三碱基组成作为支持向量机的输入参数,建模进行预测。训练集得到的预测相关系数R为0.9830,RMSE为0.0907;测试集得到的预测相关系数R为0.8606,RMSE为0.1014。结果表明,模型的预测效果良好,说明基于离散小波变换的伪三碱基组成能够有效地反映DNA序列中碱基的顺序信息,本文方法不仅能够有效地实现大肠杆菌启动子启动强度的预测,也为DNA其他生物功能的预测提供了参考。Promoters, which are the key component of DNA sequence, are responsible for the initiation of transcription and expression. The research of promoters will be useful in elucidating regulation and expression mechanism of genes. However, owing to the availability of vast amounts of genomic data, it was expensive and time-consuming to detect promoters experimentally and manually. Consequently, there was a need for developing prediction techniques that can rapidly and accurately evaluate sequences for the presence of promoters. In this paper, the pseudo-trinucleotide composition based on discrete wavelet transform was proposed to represent DNA sequence, and the pseudo-trinucleotide composition was employed to model support vector machines for the prediction of strength of Escherichia coli promoters. Initially, a two-dimensional DNA walk method was applied and the DNA sequences were converted into a two vectors of digital sequences, which were merged into a one vector of digital sequence. The achieved digital sequence was transformed by the discrete wavelet transform and the transformed digitals were combined with the trinucleotide composition to construct the pseudo- trinucleotide composition. The function and the decomposition scale of wavelet were optimized by 5-fold cross validation. The support vector machine was applied with the pseudo-trinucleotide composition as input parameters for the prediction modeling. The correlation coefficient R and root-mean-square deviation RAISE for training set was 0.9830 and 0.0907 respectively, and the R and RMSE for test set was 0.8606 and 0.1014 respectively. The good prediction results revealed that the proposed method was an effective method for promoter prediction. It can be anticipated that the novel DNA sequence representation method may hold a high potential to become a useful tool for predicting other DNA functions.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200