首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Coupling SIMD and SIMT Architectures to Boost Performance of a Phylogeny-aware Alignment Kernel
Authors:Nikolaos Ch Alachiotis  Simon A Berger  Alexandros Stamatakis
Abstract:ABSTRACT: BACKGROUND: Aligning short DNA reads to a reference sequence alignment is a prerequisite fordetecting their biological origin and analyzing them in a phylogenetic context. With thePaPaRa tool we introduced a dedicated dynamic programming algorithm forsimultaneously aligning short reads to reference alignments and correspondingevolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles thatcorrespond to the branches of such a reference tree. The algorithm needs to perform animmense number of pairwise alignments. Therefore, we explore vector intrinsics andGPUs to accelerate the PaPaRa alignment kernel. RESULTS: We optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (SingleInstruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, weobtained a 9-fold acceleration on a single core as well as linear speedups with respect tothe number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga CellUpdates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. Wealso used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, MultipleThreads) architecture. A NVIDIA GeForce 560 GPU delivered peak and averageperformance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD andSIMT implementations into a hybrid CPU-GPU system that achieved an accumulatedpeak performance of 33.8 GCUPS. CONCLUSIONS: This accelerated version of PaPaRa (available at www.exelixis-lab.org/software.html)provides a significant performance improvement that allows for analyzing larger datasetsin less time. We observe that state-of-the-art SIMD and SIMT architectures delivercomparable performance for this dynamic programming kernel when the "competingprogrammer approach" is deployed. Finally, we show that overall performance can besubstantially increased by designing a hybrid CPU-GPU system with appropriate loaddistribution mechanisms.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号