基于Yarn云平台的生物基因多序列比对并行算法  被引量:4

Parallel Algorithm Based Yarn Cloud Platform for Genetic Multi-sequence Alignment

在线阅读下载全文

作  者:邓小燕[1] 徐胜超 Deng Xiaoyan;Xu Shengchao(Jiangsu Food and Pharmaceutical Science College,Huai'an,223005;School of Electronic and Information Engineering,Qinzhou University,Qinzhou,535011)

机构地区:[1]江苏食品药品职业技术学院,淮安223005 [2]钦州学院,电子与信息工程学院,钦州535011

出  处:《基因组学与应用生物学》2019年第7期3009-3015,共7页Genomics and Applied Biology

摘  要:为了解决生物信息学中基因多序列比对的计算速度慢和软件陈旧的问题,提出了基于Yarn(Yet Another Resource Negotiator)云平台的生物基因多序列比对并行计算方法Yarn_clustalW。分析了clustalW算法的数学模型及其面向MapReduce的任务划分方式,Yarn_clustalW中综合考虑了基因的长度和数目,采用一种基于阈值刻度的任务划分方式。利用NCBI的GenBank生物基因数据作为案例程序进行了测试。实验结果表明:Yarn_clustalW比起多序列比对clustalW串行计算方法具有更快的运行时间与加速比,可以使生物科研人员节省很多时间与精力,方便对于药物靶标的发现,缩短生物药物的开发周期。In order to improve the computing speed and old processing software for multiple sequence alignment in bioinformatics,the parallel algorithm based on Yarn cloud named Yarn_clustalW for biologic multiple sequences alignment was proposed in this research.The mathematics model of ClustalW algorithm and the task partition for MapReduce approach in clustalW were discussed subsequently.A threshold scale based task partition approach was also adopted in Yarn_clustalW that the sequence length and sequence numbers were considered.A serials of simulation experiments on the GenBank data in NCBI had been done.The results and performance analysis that Yarn_ClustalW has a faster running time and speedup ratio compared with the clustalW serial calculation method of multi-sequence alignment,which can save biological researchers a lot of time and effort,and facilitate the discovery of drug targets,and shorten the development cycle of biological drugs.

关 键 词:多序列比对 云计算 映射-规约 Yarn框架 生物信息学 

分 类 号:Q811.4[生物学—生物工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象