OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

吉林大学学报(工学版) 2015

Bellman-Ford算法性能可移植的GPU并行优化

, PP. 1559-1564

刘磊, 王燕燕, 申春, 李玉祥, 刘雷

Keywords: 计算机软件,Bellman-Ford算法,GPU并行编程及优化技术,并行归约算法,性能可移植性

Full-Text Cite this paper Add to My Lib

Abstract:

提出了一种面向GPU的性能可移植的并行归约求极值优化算法和全局访存优化算法,对Bellman-Ford算法进行并行化改造,以解决不同类型GPU设备上都存在的并行粒度不足和全局内存访问不连续等问题。实验结果表明:本文的优化算法在NVIDIA和AMD的多款GPU设备上都取得了很好的效果,经本文算法优化后的程序性能较原始GPU并行版本提升3~6倍。

References

[1]	Tehrani Pouya, Zhao Qing. Distributed online learning of the shortest path under unknown random edge weights[C]∥IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver,BC,2013:3138-3142.
[2]	Carli M, Panzieri S, Pascucci F. A joint routing and localization algorithm for emergency scenario[J]. Ad Hoc Networks, 2014, 13:19-33.
[3]	Tewari A,Kumar D A.Different routing algorithm for computer networks[J].Kaav International Journal of Science, Engineering & Technology,2014,1(1):21-34.
[4]	Abouali M, Timmermans J, Castillo J E, et al. A high performance GPU implementation of Surface energy balance system (SEBS) based on CUDA-C[J]. Environmental Modelling & Software, 2013, 41: 134-138.
[5]	Huang B, Mielikainen J, Oh H, et al. Development of a GPU-based high-performance radiative transfer model for the Infrared atmospheric sounding interferometer (IASI)[J]. Journal of Computational Physics, 2011, 230(6): 2207-2221.
[6]	Singh D P, Khare N. A study of different parallel implementations of single source shortest path algorithms[J]. International Journal of Computer Applications, 2012, 54(10):26-30.
[7]	颜深根, 张云泉, 龙国平, 等. 基于OpenCL的归约算法优化[J]. 软件学报, 2011, 22(2): 163-171. Yan Shen-gen,Zhang Yun-quan,Long Guo-ping,et al.Reduction algorithm optimization based on the OpenCL[J].Journal of Software,2011,22(2):163-171.
[8]	Kumar S, Misra A, Tomar R S. A modified parallel approach to single source shortest path problem for massively dense graphs using CUDA[C]∥2011 2nd International Conference on Computer and Communication Technology, Allahabod,2011: 635-639.
[9]	Lee Jaejin, Kim Jungwon,Seo Sangmin.An OpenCL framework for heterogenous multicores with local memory[C]∥Proceeding of the 19th International Conference on Parallel Architectures and Compilation Techiniques,ACM New York,NY,USA,2010:193-204.
[10]	贾海鹏, 张云泉, 龙国平, 等. 基于 OpenCL 的拉普拉斯图像增强算法优化研究[J]. 计算机科学, 2012, 39(5): 271-277. Jia Hai-peng, Zhang Yun-quan, Long Guo-ping, et al. Research on laplace image enhancement algorithm optimization base on OpenCL[J]. Computer Science, 2012, 39(5):271-277.
[11]	NVIDIA.NVIDIA's Next Generation CUDA Compute Architecture:Kepler GK110[EB/OL].[2013-06-28].http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.
[12]	NVIDIA. NVIDIA's next generation CUDA TM compute architecture, Fermi[EB/OL].[2013-05-22].http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133