利用蛋白质序列模式识别改善谷氨酸棒杆菌基因组注释  

Improvement of genome annotation of Corynebacterium glutamicum by using protein signature

在线阅读下载全文

作  者:周大为[1,2] 李炜疆[1,2] 

机构地区:[1]江南大学工业生物技术教育部重点实验室,江苏无锡214122 [2]江南大学生物工程学院,江苏无锡214122

出  处:《工业微生物》2014年第3期70-76,共7页Industrial Microbiology

摘  要:即使细菌基因组的基因结构较为简单,但在注释过程中也可能出现基因遗漏的现象。当潜在基因在高质量数据库中没有显著同源序列时,基于知识库的基因预测方法就会遇到困难。本文希望通过系统扫描基因组所有可能ORF的蛋白质序列模式来搜索遗漏基因。为验证该方法的可行性,作者系统分析了重要的工业发酵微生物谷氨酸棒杆菌的基因组,发现了25个候选疑似基因。它们具有显著的蛋白质序列模式,但在Swiss-Prot中元显著同源序列,并且在GenBank中仍未注释。深入分析发现,25个候选疑似基因中19个为可能基因,3个为可能假基因,3个为疑似基因序列。这些结果说明本文的分析方法可以有效地用于无显著同源序列基因的搜索。Genes may be missed in annotation of genomes, even for bacteria with the simplest gene structures. Knowledge based on approaches encountered difficulties when potential genes had no significant homolognes in well-curated databases. In this work, a new method to find missing genes through systematic scan of protein sequence signatures in all possible open reading frames (ORFs) was proposed. For concept proof, the genome of Corynebacterium glutamicum, a highly interesting bacterium widely used in industry, was investigated, and finally 25 signature-carrying ORFs, with no homologues in Swiss-Prot were found that were not annotated in GenBank database. Further analyses of these ORFs showed that 19 of them had additional supportive evidences to be genes, other 3 likely pseudogenes, and the other 3 gene-like sequences. The results demonstrated the efficacy of the proposed method to identify genes with no obvious known homologues.

关 键 词:蛋白质序列模式 谷氨酸棒杆菌 基因组注释 

分 类 号:Q78[生物学—分子生物学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象