检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国科学院水生生物研究所,中国科学院水生生物多样性与保护重点实验室,武汉430072 [2]中国科学院研究生院,北京100049
出 处:《遗传》2011年第6期654-660,共7页Hereditas(Beijing)
基 金:淡水生态与生物技术国家重点实验室项目(编号:2011FB17);国家重点基础研究发展规划(973计划)项目(编号:2008CB418002)资助
摘 要:二代测序技术及全基因组多样性比较是现代生物学及信息科学研究的热点,对基因组中转座元件(Transposable element)的分析已成为基因组比较分析的重要组成部分。目前对于转座元件的种类、数量和组成的挖掘和分析一般是基于完全拼接后的全基因组序列,对在此之前的海量短片段序列后期处理及拼接仍是目前基因组研究的盲点,以转座元件为主的重复序列在拼接过程中也存在着不可避免的拼接误差或丢失,给转座元件系统的分析带来不确定。文章旨在建立一套分析流程,对铜绿微囊藻NIES 843全基因组构建的罗氏(Roche)公司454测序随机模拟原始数据集的转座元件(主要类型为插入序列:Insert sequence,IS)组成进行分析,结果表明,采用对核酸探针扫描后备选序列分成3组,并分设氨基酸检测阈值的方案分析得到的结果较为可靠,结果显示铜绿微囊藻NIES843的蓝藻转座元件占基因组比例的10.38%,归属于14个IS家族,66个IS亚家族。与之前基于完整拼接基因组数据的两套不同分析流程得到的结果相比,在丰度及家族/亚家族组成上无显著差异,在转座元件序列水平上也显示了高比例的相似性序列重叠,证实了本研究流程在基于高通量测序原始数据的转座元件分析方面具可靠性及实用性。Researches on the next generation sequencing(NGS) and the comparative genome analysis have recently been concerned.The analyses on transposable element composition and abundance are important parts for genome studies.Gen-erally,the analyses of transposable element system were based on the complete spliced genomes;however,the post-processing and sequence splicing of the huge amount of short sequences from the 454 sequencer always encounter problems.Moreover,the occasion that large amount of repeat elements made up by transposable elements were incorrectly splicing or lost,leading to uncertain results.This study aimed at the construction of a framework to automatically analyze the insert sequence(IS) abundance and their composition based on a stimulated Roche 454 deep-sequencing data set,which was a 33-fold coverage of Microcystis aeruginosa NIES 843 genome.The result from the examination under the setting of three classes of division on the IS element candidates and a separated transposase examination thresholds is the most reli-able.It showed that the abundance of IS element in this stimulated dataset was 10.38%,including 14 IS families and 66 IS subfamilies,which demonstrated no significant difference with the two sets of previous analysis results based on the spliced M.aeruginosa NIES 843 genome and a high percentage of IS element sequence overlap,indicating the reliability of this framework.
关 键 词:蓝藻基因组 插入序列 IS家族 转座元件 Roche454测序原始数据
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33