机构地区:[1]福建农林大学动物科学学院(蜂学学院),福州350002 [2]福建农林大学生命科学学院,福州350002 [3]福建农林大学,福建省病原真菌与真菌毒素重点实验室,福州350002
出 处:《昆虫学报》2020年第11期1345-1357,共13页Acta Entomologica Sinica
基 金:国家现代农业产业技术体系建设专项资金(CARS-44-KXJ7);福建省自然科学基金项目(2018J05042);福建省教育厅中青年教师教育科研项目(JAT170158);福建农林大学硕士生导师团队项目(郭睿);福建省病原真菌与真菌毒素重点实验室(福建农林大学)开放课题;福建农林大学优秀硕士学位论文资助基金(杜宇)。
摘 要:【目的】利用已获得的纳米孔长读段测序数据完善现有的蜜蜂球囊菌Ascosphaera apis参考基因组注释信息,并对未注释的新基因和新转录本进行鉴定和功能注释。【方法】基于已获得的纳米孔长读段测序数据,采用gffcompare软件将蜜蜂球囊菌全长转录本与参考基因组注释的转录本进行比较,进而对参考基因组注释基因的非翻译区(untranslated region,UTR)进行延长。利用TransDecoder软件对蜜蜂球囊菌基因的开放阅读框(open reading frame,ORF)及相应的氨基酸序列进行预测。通过MISA软件发掘长度在500 bp以上的全长转录本的SSR位点。通过Blast工具将鉴定到的新基因和新转录本比对Nr,KOG,eggNOG,Swiss-Prot,Pfam,GO和KEGG数据库进行功能注释。【结果】共对蜜蜂球囊菌的9481个基因进行了UTR延长,其中5′UTR和3′UTR延长的基因分别有4744和4737个。共预测出10492个完整ORF,其中编码长度分布在0~100和100~200个氨基酸的ORF最多,分别占ORF总数的38.96%和36.90%。共鉴定到5286个SSR,其中单核苷酸重复、二核苷酸重复、三核苷酸重复、四核苷酸重复、五核苷酸重复和六核苷酸重复的SSR分别为1870,826,2398,138,43和11个。共鉴定到1558个新基因,其中有1556,731,330,592,1177,709和589个新基因可分别被注释到Nr,Swiss-Prot,Pfam,KOG,eggNOG,GO和KEGG数据库。此外,还鉴定到14403条新转录本,其中有14376,8524,7276,7405,12035,7891和6855条新转录本可分别被注释到上述7个数据库。【结论】本研究利用已获得的纳米孔长读段测序数据对蜜蜂球囊菌的完整ORF进行了预测,对参考基因组的已注释基因进行了UTR延长,对未注释的SSR位点进行了发掘,此外还鉴定到大量未注释的新基因和新转录本,并对它们进行了功能注释。研究结果较好地完善了现有的蜜蜂球囊菌的基因组注释,为其组学和分子生物学研究的深入开展提供了基础。【Aim】This study aims to improve the annotation information of the current reference genome of Ascosphaera apis by utilizing previously gained nanopore long-read sequencing data,and to identify and perform functional annotation of unannotated novel genes and novel transcripts.【Methods】Based on the previously gained nanopore long-read sequencing data,full-length transcripts of A.apis were compared with transcripts annotated in the reference genome using gffcompare software to prolong untranslated regions(UTRs).The open reading frames(ORFs)of genes in A.apis and their corresponding amino acid sequences were predicted using TransDecoder software.MISA software was used to survey simple sequence repeat(SSR)loci within transcripts with a length above 500 bp.Based on Blast tool,novel genes and novel transcripts were aligned to the Nr,KOG,eggNOG,Swiss-Prot,Pfam,GO and KEGG databases to gain their corresponding functional annotations.【Results】Totally,UTRs of 9481 genes in A.apis were prolonged,among which 4744 and 4737 genes were prolonged at 5′UTR and 3′UTR,respectively.In addition,10492 complete ORFs were predicted,among which the ORFs encoding proteins distributed in 0-100 aa and 100-200 aa in length were the most abundant,accounting for 38.96%and 36.90%of the total ORFs,respectively.A total of 5286 SSRs were identified,and the numbers of mononucleotide repeats,dinucleotide repeats,trinucleotide repeats,tetranucleotide repeats,pentanucleotide repeats and hexanucleotide repeats were 1870,826,2398,138,43 and 11,respectively.Besides,1558 novel genes were identified,among which 1556,731,330,592,1177,709 and 589 were annotated to the Nr,Swiss-Prot,Pfam,KOG,eggNOG,GO and KEGG databases,respectively.Additionally,14403 novel transcripts were identified,among which 14376,8524,7276,7405,12035,7891 and 6855 were respectively annotated to the aforementioned seven databases.【Conclusion】By using the previously obtained nanopore long-read sequencing data,the complete ORFs of genes in A.apis has been predicted,the U
关 键 词:蜜蜂球囊菌 长读段测序技术 全长转录组 基因组 蜜蜂 白垩病
分 类 号:S895.3[农业科学—特种经济动物饲养]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...