基于Hash索引的高通量基因序列比对并行加速技术研究  被引量:4

Parallel Accelerator Design for High-Throughput DNA Sequence Alignment with Hash-Index

在线阅读下载全文

作  者:王文迪[1,2] 汤文[1,2] 段勃[1,2] 张春明[1] 张佩珩[1] 孙凝晖[1,3] 

机构地区:[1]中国科学院计算技术研究所高性能计算机研究中心,北京100190 [2]中国科学院大学,北京100049 [3]计算机体系结构国家重点实验室(中国科学院计算技术研究所),北京100190

出  处:《计算机研究与发展》2013年第11期2463-2471,共9页Journal of Computer Research and Development

基  金:国家"九七三"重点基础研究发展计划基金项目(2012CB316502);国家"八六三"高技术研究发展计划基金项目(2009AA01A129);中国科学院知识创新工程重大项目(KGCX1-YW-13);国家自然科学基金项目(60803030;60633040;60925009;60921002)

摘  要:近年来随着高通量基因测序技术的迅速发展,测序成本和周期都得到了大幅降低.然而,新一代测序技术海量数据生成能力以及各类测序算法蕴含的高并发性却对现有计算机的运算能力提出了新挑战.以一个基于Hash索引算法实现的开源重测序程序(PerM)为例,研究了在商用多核CPU上加速该应用程序的关键技术.在一个64核SMP系统上的实验结果证明,提出的优化技术可以使Cache缺失率降低90%,性能提升4~11倍.接下来探讨了在一个包含XilinxLX330FPGA的加速卡上设计实现专用并行加速系统的相关问题.作为原型验证系统,在基于FPGA的PCIe加速卡上设计并实现了包含11个处理单元的脉动陈列并行计算系统.和IntelXeonX75508核CPU相比,提出的并行加速器有30~65倍性能功耗比优势.In recent years, due to the rapid development of high-throughput next generation sequencing (NGS) technologies, the sequencing cost and time have been greatly reduced. However, both the explosion of the generated NGS data and the massively parallel computation pose great challenges to the capability of existing computers. We take an open-source re-sequencing algorithm based on hash-index, called PerM, as an example to investigate the optimizations for accelerating NGS with commercial multi-core CPUs as well as with customized parallel architectures. Firstly, we optimize the original algorithm by reordering the bucket accessing sequences so that data locality in shared cache is improved. Secondly, to exclude the empty hash buckets, we propose a hash-index compression algorithm, which coincides with the sequential access nature of the optimized algorithm. The experiments on a 64-cores SMP (Intel Xeon X7550) show that the optimized algorithm reduces LLC miss ratio to about 10% of the original algorithm, therefore the overall performance can be improved by 4 to 11 times. Furthermore, a parallel accelerator architecture is designed and evaluated on our customized FPGA accelerator card with a Xilinx LX330 FPGA resident. As a prototype, a systolic array of 100 PEs is built, "which operates at 175 MHz. The performance of the proposed parallel accelerator architecture is justified by the reported speedup of 30 to 65 times over an 8-cores CPU.

关 键 词:Hash索引 生物信息学 高通量测序 FPGA 并行加速器 

分 类 号:TP302[自动化与计算机技术—计算机系统结构] TP334.7[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象