All Title Author
Keywords Abstract

-  2017 

The new approach of multiple genome sequence matching based on GPU

Keywords: 后缀数组 后缀树 GPU 基因片段匹配 并行
suffix array suffix tree GPU genome sequence matching parallel

Full-Text   Cite this paper   Add to My Lib


后缀树和后缀数组广泛用于生物信息学领域中,特别是通过启发式算法在对DNA基因片段进行匹配的阶段。本文提出了在GPU 的平台下,利用多核和超多核体系构成的后缀树以及后缀数组并行匹配大规模基因片段,从而加速基因搜索匹配过程。相对于后缀树,后缀数组二分搜素算法具有内存占用少,缓存使用率高等优点。在GPU的性能评估中,后缀数组执行效率明显超过后缀树,后缀数组占用的空间仅为后缀树的20%―30%。相对于CPU的串行实现,后缀树组达到了约99倍的加速比。实验结果表明在基因片段匹配的过程中,基于GPU的后缀数组二分搜索是一种高效且实用的方法。
Suffix trees and suffix arrays have been used widely in bioinformatics applications, especially for DNA sequence alignments in the initial exact match phase of heuristic algorithms. In this paper, a new GPU implementation and optimization of the suffix tree and suffix array on both multi-core and many-core platforms to accelerate multiple genome sequence searching is presented. The comparative performance evaluation between the suffix tree and suffix array is then carried out. The results showed that the suffix array needed only 20?C30% of memory space compared with the suffix tree, and that the mean search time of the suffix array was significantly shorter than the mean search time of the suffix tree because of the use of a binary search with coalesced memory access and tile optimization under the GPU architecture. Moreover, the GPU implementation of the suffix array gained a speedup of approximate 99 times compared with the corresponding CPU serial implementation. This study showed that the massively parallel sequence matching algorithm based on suffix array was an efficient approach with the high-performance in the process of multiple DNA sequence matching


comments powered by Disqus