测序错误和重复序列对无参照基因组单核苷酸多态性分型的影响  

The Effect of Sequencing Error and Repetitive Sequence on de novo SNP Calling

在线阅读下载全文

作  者:窦锦壮[1,2] 赵熙强[2] 付晓腾[1] 焦文倩[1] 王南南[2] 张玲玲[1] 胡晓丽[1] 王师[1] 包振民[1] 

机构地区:[1]中国海洋大学海洋生命学院海洋生物遗传育种教育部重点实验室,山东青岛266003 [2]中国海洋大学数学科学学院,山东青岛266100

出  处:《中国海洋大学学报(自然科学版)》2013年第5期120-124,共5页Periodical of Ocean University of China

基  金:国家自然科学基金重点项目(31130054);国家基础研究发展计划项目(2010CB126402);国家高技术研究发展计划项目(2012AA10A405);教育部新世优秀人才支持计划项目(NCET-10-0761)资助

摘  要:单核苷酸多态性(Single nucleotide polymorphism—SNP)被认为是揭示遗传变异理想的分子标记,近几年来一系列针对高通量测序平台的技术如RAD,GBS,RRLs,2b-RAD等成为非模式生物尤其是水生动物的de novo SNP标记规模开发和大样本群体遗传研究的有利途径。本文从理论上讨论了测序错误和重复序列因素对de novo SNP分型的影响,并利用模式生物拟南芥RAD模拟数据对理论分析进行了验证。通过理论推导和模拟验证发现测序数据量在15~20X左右时单拷贝区域内SNP被检测的概率大于95%,等位基因的支持度不小于2时能够有效屏蔽掉测序错误对SNP分型的影响(假阳性低于2%),这些为实际数据的de novo SNP分型提供了理论上的指导。Single nucleotide polymorphisms(SNPs) are the most abundant type of genetic variation in eukaryotic genomes , recently, several genotyping methods such as RAD, GBS, RRLs based on NGS platforms have been developed, most of which utilize restriction enzymes for genome complexity reduction (GCR) to reduce the total sequencing cost. In this paper, we discussed the effect of sequencing error and repetitive sequence on de novo SNP calling by theory based method and simulation approach. The average sequencing coverage 15~20X is reasonable for detection of SNP(~95%) located in single copy region with false positive rate lower than 2%. At the stage of SNP calling, each allele supported by at least 2 reads is the basic assumption which can excluded the effect of sequencing error. All the results either from theory-based method or simulation approach are hoped to applied in the real data analysis.

关 键 词:DE novo SNP分型 测序错误 重复序列 

分 类 号:S917[农业科学—水产科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象