一种基于RNA-Seq的基因组注解评估方法  

RNA-Seq-based assessment for genome annotation databases

在线阅读下载全文

作  者:王颖[1] 刘麟[1] 

机构地区:[1]厦门大学信息科学与技术学院自动化系,厦门361000

出  处:《科学通报》2013年第33期3471-3482,共12页Chinese Science Bulletin

基  金:国家自然科学基金(61203282;61202144);美国国立卫生研究院基金(NIH/NIMH 5 RC2 MH090047-01)资助

摘  要:新一代测序技术下RNA-Seq测序数据为解码真核生物的转录组带来了突破性的变革,其细致到碱基层面的高分辨率信息,使得仅采用RNA-Seq作为唯一数据源便可对现有的基因组进行注解.同样地,利用RNA-Seq信息也能验证现有的剪切位点、外显子乃至转录物的注解信息.因此本文提出利用RNA-Seq数据对现有的基因组注解数据库进行评估,基于RNA-Seq的配准信息提出在基因、转录物、外显子、剪切位点和碱基层面的特异性和敏感性度量指标,进而评估基因组注解数据库的完整性和精确性.基于该评估框架,通过来自人类16个组织的11亿条RNA-Seq读段(read)数据对5个代表性的人类基因组注解数据库进行评估,并基于评价结果构建人体综合准确注解数据库;此外,还对现有的恒河猴基因组注解数据库进行了评估,发现该数据库的完整性有很大欠缺,同时其注解的精确性与人类数据库的注解水平有较大的差距.基于该评估体系,可对各物种的基因组注解信息的完整性和精确性进行全面、快速和高效的评估及验证.RNA-Seq brings a breakthrough to decode eukaryotic transritptomes. With the high resolution to nucleotide level, RNA-Seq can be adopted as an only data resources to annotate a whole genome. Similarily, RNA-Seq should be able to validate the annotated splicing junction, exon and transcript sets. Therefore, this study proposed an evaluation scheme for the accuracy (specificity) and completeness (sensitivity) of genome annotation databases at gene/transcript/exon/splice-junction/nucleotide base levels with RNA-Seq datasets as only resources. The scheme was applied to assess 5 widely-used human genome annotation databases using 1.1 billion high-quality RNA-Seq reads from 16 human tissues. Accurate-annotated transcripts were collected from the 5 databases to build combined accurate-annotated transcripts databases for the 16 tissues and the whole human body. Furthermore, the assessment for current rhesus annotation database showed that it is far from complete, and not so accurate as Human's annotations. The RNA-Seq analysis pipeline was constructed to implement an express and efficient assessment of various organisms' genome annotations over the whole transcriptome. The implementing pipeline can be downloaded from http://code.google.com/p/genome-annotation-assessment-pipeline/downloads/.

关 键 词:基因组 转录组 注解数据库 RNA—Seq 敏感性 特异性 

分 类 号:Q78[生物学—分子生物学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象