Unix文本比对分析高通量RNA-Seq测序基因表达  

Gene expression analysis from high-throughput RNa-Seq sequencing by Unix Text-aligning

在线阅读下载全文

作  者:宋东光 卢博彬 陈柳婷 SONG Dongguang,LU Bobin, CHEN Liuting(Department of Horticulture, Foshan University, Foshan 528231, Guangdong, China)

机构地区:[1]佛山科学技术学院园艺系,广东佛山528231

出  处:《生物信息学》2018年第2期119-129,共11页Chinese Journal of Bioinformatics

摘  要:从RNA-Seq高通量测序短序列进行比对及拼接获得较长转录本并确定基因表达量的方法随着转录组测序的广泛开展仍在不断改进,本文利用类Unix系统的文本处理命令组合对山茶花开花期叶片及花瓣的转录组序列进行比对、序列拼接及其表达量分析。首先对测序序列进行每1万条准随机排序,选取10万条序列分别与100万条序列进行比对,从每个查询序列随机选取9组20 mer比对100万条序列去重后获得该序列的转录数量。利用查询序列首尾20 mer从匹配的比对重叠群进行拼接,初次拼接最长为410 mer,超过两个及以上拼接序列的再次进行相互比对及再拼接,最长1 174 mer。用查询序列的比对匹配数表示其拼接前后的表达量,与用互补链进行比对得到的负链表达量相当。用拼接序列进行NCBI联网blast比对获得了其基因注释。本文得到的结果表明,利用类Unix系统文本比对可以有效用于高通量测序基因表达量及进行序列从头组装等分析。With the rapid development in transcriptome sequencing nowadays,further improvement in methods for estimating gene expression is underway in aligning and assembling short high-throughput RNA-Seq sequences into longer transcripts. Preliminary sequence alignment,assembly and expression of blade and petal transcriptome of Camellia at flowering stage were reported in this study by the combinations of text-filtering commands in Unix-like operating system. Firstly,near-random sorting of every 10 000 sequences were completed,then 100 000 sequences were aligned to 1 million sequences. 9 randomly selected groups of 20 mers selected from each query sequence were aligned to 1 million sequences,and transcripts were counted after removing duplicated sequences. By first-and-last20 mers of query sequences,assembly was conducted in matching contigs of each aligned group. The longest sequence in first assembly was 410 mers. The longest sequence was 1174 mers in re-aligning and reassembly of two or more joint sequence. Matched aligning counts of each query sequence were used as its expression before and after assembling,which was approximately equal to the minus strand's expression after comparing with that of complementary strand. Gene connotations were obtained by aligning joint sequences to remote NCBI blast server.The results show that gene expression and de novo assembly could be effectively analyzed by text-aligning in Unixlike system.

关 键 词:RNA-SEQ 文本比对 基因表达量 重叠群拼接 类Unix系统 

分 类 号:Q344.13[生物学—遗传学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象