一种改进的六联体使用频率编码测度被引量：2

An improved coding measure based on hexamer usage

出　　处：《华中科技大学学报（自然科学版）》2005年第7期107-110,共4页Journal of Huazhong University of Science and Technology(Natural Science Edition)

基　　金：国家自然科学基金资助项目(90203011);湖北省自然科学基金资助项目(2002AC014).

摘　　要：在基因预测软件中常用的编码测度得到的序列编码潜力大小往往与序列的C+G含量紧密相关,从而影响了对蛋白编码区的识别效果.研究发现六联体使用偏好与其自身C+G含量存在一种近似线性的相关性,据此提出了一种改进的六联体使用偏好模型,通过综合考虑六联体使用频率与六联体的C+G含量,可简便有效地减小序列编码潜力大小对序列C+G含量的依赖性.测试表明,与分类建模策略相比,该方法所需的训练数据较少,而且具有更好的蛋白编码区识别效果,因此可用于基因预测软件中以提高蛋白编码区与基因结构的预测精度.Statistical characteristics of nucleotide composition are important information to identify protein coding regions. However, coding potentials calculated by some widely used coding measures closely related to sequence C+G content, thus the performance of recognizing protein coding regions is affected. In view of the fact, the strategy of learning parameters from different C+G content reference sets separately, and some famous eukaryotic gene identification programs are adopted in. An improved hexamer usage preference model reducing the dependence of coding potential on C+G content was presented. In proposed algorithm less training data is needed, but better performance of recognizing protein coding regions than the former strategy gained. It is hoped that the algorithm is useful to improve the accuracy of some existing gene-finding programs.

关键词：编码测度六联体使用偏好 C+G含量蛋白编码区识别基因预测软件

分类号：Q789[生物学—分子生物学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进的六联体使用频率编码测度被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种改进的六联体使用频率编码测度 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种改进的六联体使用频率编码测度被引量：2