用电子克隆新基因C17orf32和ZNF362对NCBI人类基因数据库模式参考序列5种错误类型的分析与纠正  被引量:3

Correction of Five Different Types of Errors of Model Refseqs Appeared in NCBI Human Gene Database Only by Using Two Novel Human Genes C17orf32 and ZNF362

在线阅读下载全文

作  者:张德礼[1] 李衍达[1] 季梁[1] 

机构地区:[1]清华大学信息科学技术学院自动化系生物信息学研究所信息科学技术国家实验室

出  处:《Acta Genetica Sinica》2004年第4期325-334,共10页

基  金:国家自然科学基金资助项目 (编号 :3 0 2 70 3 42 ) ~~

摘  要:采用生物信息学分析与实验确认相结合的技术路线 ,通过所识别的基因在非冗余数据库比对发现了网上公布的计算机注释人类基因组编码序列存在各种类型的多处错误。该策略既有助于发现更多的人类新基因 ,又有助于纠正美国国家生物技术信息中心 (NCBI)基因组注释项目公布的参考序列 (REFSEQs)中所存在的错误。比如他们采用基因预测方法通过自动计算分析从NCBIcontigNT_0 10 80 8预测到两个模式参考序列LOC12 4 919和LOC14 70 0 7,本该都是C17orf32 ,但却都是C17orf32的不同错误形式 ,分别为第 1和 2类型错误 ;再如 ,他们采用基因预测方法通过自动计算分析从NCBIcontigNT_0 0 4 5 11预测到 3个模式参考序列LOC14 90 7、LOC2 0 0 0 84和LOC9112 6 ,实际上都是ZNF36 2一种基因 ,却提交了ZNF36 2的 3种不同错误形式 ,分别为第 4、5和 7类型错误。本研究利用计算机识别并结合实验验证能够纠正或避免现有的人类基因组编码序列错误。以前公开发表的文献没有明确指出NCBI人类基因模式参考序列存在错误 ,因此应当慎重看待计算机注释的可能存在各种类型错误的人类基因组编码序列。人类新基因的正确识别和注释仍是一项长期而繁重的任务。Found that there exist many mistakes in the REFSEQ issued in the genome annotation project of NCBI,the result of which indicates that people be cautious in using REFSEQ database in NCBI.By adopting the technical route combining bioinformatics analysis and experimental verification,through the comparison of the cloned genes in the non-redundant database,we found that there were many mistakes in the computer annotation human genome coding sequences that were issued on the internet.First we quoted nine wrong types of novel human genes anticipated by NCBI GENOME Annotation Project.Here we give one example in detail:(1)Comparison of the sequences between novel human gene C17orf32 and hypothetical human gene LOC124919.LOC123722 is a modified sequence of C17orf32 cDNA with an inserted G between 406~407 nucleotides.The base G in the 401 position of LOC123722 cDNA is a redundant insert,which causes a reading frame shift in the translation of an alternative protein.This inserted G has not been found in our experimental clone,and is fully rejected by human EST alignment,and is shown as a redundance by genomic GT/AG organization analysis.(2) Comparison of the sequences between novel human gene C17orf32 and hypothetical human gene LOC147007.C17orf32 gene (ORF from 31 to 657 nucleotides) is located on human chromosome 17(Accession No.NT_010808.7),and is only linked with a hypothetical human gene LOC147007 (ORF from 55 to 435 nucleotides) at present.This hypothetical human gene sequence has not been verified by experiment,and is a wrong form of our verified C17orf32 gene.The full-length 1 679 bp cDNA sequence of C17orf32 exhibits overall homology to that of LOC147007 of 625 bp mRNA,with matching percentage of 37% in 36% of total window over the full-length nucleotide,especially 121~366 bp of LOC147007 is just the same as 316~561 bp of C17orf32.Thus,the 126 aa protein encoded by XP_097165 of LOC147007 exhibits overall homology to the 208 aa protein encoded by C17orf32,with matching percentage of 50% in 48% of total window

关 键 词:人类基因组 表达序列标签 计算机克隆 模式参考序列 生物信息学 

分 类 号:TP392[自动化与计算机技术—计算机应用技术] Q987[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象