检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]华中科技大学生命科学与技术学院,湖北武汉430074
出 处:《华中科技大学学报(自然科学版)》2005年第7期107-110,共4页Journal of Huazhong University of Science and Technology(Natural Science Edition)
基 金:国家自然科学基金资助项目(90203011);湖北省自然科学基金资助项目(2002AC014).
摘 要:在基因预测软件中常用的编码测度得到的序列编码潜力大小往往与序列的C+G含量紧密相关,从而影响了对蛋白编码区的识别效果.研究发现六联体使用偏好与其自身C+G含量存在一种近似线性的相关性,据此提出了一种改进的六联体使用偏好模型,通过综合考虑六联体使用频率与六联体的C+G含量,可简便有效地减小序列编码潜力大小对序列C+G含量的依赖性.测试表明,与分类建模策略相比,该方法所需的训练数据较少,而且具有更好的蛋白编码区识别效果,因此可用于基因预测软件中以提高蛋白编码区与基因结构的预测精度.Statistical characteristics of nucleotide composition are important information to identify protein coding regions. However, coding potentials calculated by some widely used coding measures closely related to sequence C+G content, thus the performance of recognizing protein coding regions is affected. In view of the fact, the strategy of learning parameters from different C+G content reference sets separately, and some famous eukaryotic gene identification programs are adopted in. An improved hexamer usage preference model reducing the dependence of coding potential on C+G content was presented. In proposed algorithm less training data is needed, but better performance of recognizing protein coding regions than the former strategy gained. It is hoped that the algorithm is useful to improve the accuracy of some existing gene-finding programs.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33