基于PacBio平台的全长转录组测序  被引量:20

Full-length transcriptome sequencing on Pac Bio platform

在线阅读下载全文

作  者:任毅鹏[1] 张佳庆[1] 孙瑜[1] 吴振峰[2] 阮吉寿[2] 贺秉军[1] 刘国卿[1] 高山[1] 卜文俊[1] 

机构地区:[1]南开大学生命科学学院,天津300071 [2]南开大学数学科学学院,天津300071

出  处:《科学通报》2016年第11期1250-1254,共5页Chinese Science Bulletin

基  金:南开大学2015年研究生科研创新计划;国家自然科学基金(31371974;31201738)资助

摘  要:当前,绝大多数的转录组数据都是基于以Illumina平台为代表的第二代高通量测序技术获得的,但是第二代测序技术无法提供大量的长转录本并且丢失可变剪接等重要信息,因而大大制约了转录组数据的深度利用.通过以PacBio为代表的第三代测序技术,可以获得更长乃至全长转录组,但由于Pac Bio转录组测序近几年才刚兴起,只有少量的物种基于PacBio平台获得了转录组.PacBio全长转录组测序,在国际上才刚开展但发展很快,其实验与数据分析标准和质量控制方面的研究对于未来的大规模应用至关重要.本研究在国际上首次尝试依据PacBio平台最新试剂(P6/C4)优化实验参数,设计质量控制指标并使全长转录组测序标准化.本文基于一组昆虫(麻皮椿)全长转录组数据,对已取得的部分结果进行报告.The Next Generation Sequencing(NGS) technology, particularly the Illumina platform now has produced most of the animal and plant transcriptomes, but the short reads from NGS sequencers result in incompletely assembled transcripts which are lack of some important information(e.g. alternative splicing). This limits better understanding of transcriptome data. Based on the single-molecule real-time(SMRT) sequencing technology, the Pac Bio platform can provide longer and even full-length transcripts that originate from observations of single molecules without assembly. The full-length transcripts can be used to investigate alternative splicing, alternative polyadenylation, novel genes, non-coding RNAs and fusion transcripts, et al. Until the end of 2015, transcriptomes of a few species have been sequenced using the Pac Bio platform. They are classfied into three groups. The first group includes human lymphoblastoid and Salvia miltiorrhiza using a combination of NGS short reads and SMRT technology. The second group includes HIV-1, bovine immunoglobulin G, human embryonic stem cells, mouse neurexins and Propithecus coquereli using SMRT. The third group includes european cuttlefish, tetraploid cotton and fungi using SMRT with the latest Pac Bio full-length transcriptome data analysis pipeline Iso Seq. The use of SMARTer PCR c DNA Synthesis Kit and the Iso Seq data analysis pipeline was recommended to facilitate full-length transcriptome sequencing. However, the transcriptome data quality could be affected by ribosomal RNA contamination, cross-contamination on agarose gel, the effect of size selection using gel or Blue Pippin, prevalence of PCR chimera products and the wrong removal of SMRT bell adapters. Although Iso Seq can remove artificial concatemers that are produced due to insufficient SMRT bell amount during the sequencing library preparation step, some problems still exists. For example, Iso Seq can not distinguish PCR chimeras from true fusion genes. Another critical problem is the misidentification

关 键 词:全长转录组 单分子测序 PAC BIO 质量控制 标准流程 

分 类 号:Q78[生物学—分子生物学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象