Illumina-Solexa测序数据质量评估系统的构建  

Evaluation System for Sequencing Data Generated by Illumina-Solexa Platform

在线阅读下载全文

作  者:宋琳琳[1,2,3] 顾朝辉[1,2,3] 韦朝春[1,4] 陈赛娟[2,3] 

机构地区:[1]上海交通大学生命科学技术学院,上海200240 [2]上海交通大学系统生物医学研究院,上海200240 [3]上海交通大学医学院附属瑞金医院上海血液学研究所,医学基因组学国家重点实验室,上海200025 [4]上海生物信息技术中心,上海200235

出  处:《现代生物医学进展》2009年第15期2899-2902,2912,共5页Progress in Modern Biomedicine

摘  要:目的:针对下一代测序数据量大、序列长度短的特点,研究数据分析和质量评估方法。方法:选择已发布的Illumina-Solexa平台测序数据为研究对象,通过MAQ软件将测序数据与人类全基因组序列进行比对,并以外显子区域为例,在位点水平对测序数据质量进行评估。结果:结合已有软件系统和本文自创线性算法,建立了一套包括比对、拼接在内的测序数据质量评估系统。比对分析后,发现原始测序序列共覆盖了127,113,378个位点,涉及24条染色体上的64868个外显子。其中,每个位点都被测到的外显子为0.50%,位点平均测序深度大于等于1的外显子为3.98%。结论:成功构建了基于Illumina-Solexa测序平台的数据分析和质量评估方法,其可适用于其它第二代测序平台。研究者可在质量评估的基础上完善测序试验设计,并进行SNP和突变筛选及后续功能性研究。Objective: To deal with the huge number of short sequences from the next-generation-sequencing (NGS) platforms, and to establish a systematical method to evaluate the quality of the sequenced data. Methods: In this paper, the raw data is short se- quences acquired from Illumina-Solex a sequencing platform. The MAQ software was used to align these sequences with human genomic sequence and extract all the sites' information from the alignment result. After that, an algorithm was developed to evaluate the sequenc- ing output in specific regions, like exons, on the locus level. Results: In combination with the existing software and a linear algorithm cre- ated by ourselves, an evaluation system was established including of the alignment, assemble and assessment of the sequencing data. Af- ter our analysis, 127,113,378 sites and 64,868 exnons in human's 24 chromosomes are covered in the raw sequencing dada. In all the total number of covered exons, 0.50% of them are totally covered on every site and 3.98% of them are covered with the average depth greater than 1. Conclusions Although this method is based on the data from Illumina-Solexa sequencing platform, the analysis software and qual- ity-checking algorithm could also be used for other next-generation-sequencing platforms with little adaptation. Based on the analysis re- sult of the sequencing data, researchers can improve their experiments design and guarantee the reliability results of their SNPs and muta- tions screening for further functional study.

关 键 词:下一代测序 Illumina-Solexa测序平台 MAQ比对软件 测序质量评估 

分 类 号:Q75[生物学—分子生物学] Q78

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象