全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs

DOI: 10.4236/jcc.2017.56005, PP. 65-83

Keywords: Sparse Matrix-Vector Multiplication, Performance Prediction, GPU, Normal Distribution, Uniform Distribution

Full-Text   Cite this paper   Add to My Lib

Abstract:

As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo and Wang put forward a new idea to predict the performance of SpMV on GPUs. However, they didn’t consider the matrix structure completely, so the execution time predicted by their model tends to be inaccurate for general sparse matrix. To address this problem, we proposed two new similar models, which take into account the structure of the matrices and make the performance prediction model more accurate. In addition, we predict the execution time of SpMV for CSR-V, CSR-S, ELL and JAD sparse matrix storage formats by the new models on the CUDA platform. Our experimental results show that the accuracy of prediction by our models is 1.69 times better than Guo and Wang’s model on average for most general matrices.

References

[1]  Bolz, J., Farmer, I., Grinspun, E. and Schroder, P. (2005) Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. ACM Transactions on Graphics, 22, 917-924.
[2]  Bell, N. and Garland, M. (2009) Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, Portland, 14-20 November 2009, Article No. 18.
https://doi.org/10.1145/1654059.1654078
[3]  Li, R.-P. and Saad, Y. (2013) GPU-Accelerated Preconditioned Iterative Linear Solvers. The Journal of Supercomputing, 63, 443-466.
https://doi.org/10.1007/s11227-012-0825-3
[4]  Vazquez, F., Ortega, G., Fernandez, J.J. and Garzon, E.M. (2010) Improving the Performance of the Sparse Matrix Vector Product with GPUs. Proceedings of the 10th IEEE International Conference on Computer and Information Technology, IEEE Computer Society, Bradford, 29 June-1 July 2010, 1146-1151.
[5]  Monakov, A., Lokhmotov, A. and Avetisyan, A. (2010) Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P. and Martorell, X., Eds., High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, Vol. 5952. Springer, Berlin, Heidelberg, 111-125.
https://doi.org/10.1007/978-3-642-11515-8_10
[6]  Zheng, C., Gu, S., Gu, T.-X., Yang, B. and Liu, X.-P. (2014) BiELL: A Bisection ELLPACK Based Storage Format for Optimizing SpMV on GPUs. Journal of Parallel and Distributed Computing, 74, 2639-2647.
https://doi.org/10.1016/j.jpdc.2014.03.002
[7]  Gu, T.-X., Zheng, C., Gu, S. and Liu, X.-P. (2014) Solving Sparse Linear Systems on GPUs Based on the BiELL Storage Format. Proceedings of International Conference on Parallel, Distributed Systems and Software Engineering, Singapore, 96-107.
[8]  Choi, J.W., Singh, A. and Vuduc, R.W. (2015) Model-Driven Autotuning of Sparse Matrix-Vector Multiply on GPUs. ACM SIGPLAN Notices, 45, 115-126.
https://doi.org/10.1145/1837853.1693471
[9]  Guo, P., Wang, L.-Q. and Chen, P. (2014) A Performance Modeling and Optimization Analysis Tool for Sparse Matrix Vector Multiplication on GPUs. IEEE Transactions on Parallel and Distributed Systems, 25, 1112-1123.
https://doi.org/10.1109/TPDS.2013.123
[10]  Resios, A. (2011) GPU Performance Prediction Using Parametrized Models. Master’s Thesis, Utrecht University, Utrecht.
[11]  Dinkins, S. (2012) A Model for Predicting the Performance of Sparse Matrix Vector Multiply (SpMV) Using Memory Bandwidth Requirements and Data Locality. Master’s Thesis, Colorado State University, Fort Collins.
[12]  van Werkhoven, B., Maassen, J., Seinstra, F.J. and Bal, H.E. (2014) Performance Model for CPU-GPU Data Transfers. 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Chicago, 26-29 May 2014, 11-20.
[13]  Baghsorkhi, S.S., Delahaye, M., Gropp, W.D. and Hwu, W.-M.W. (2012) Analytical Performance Prediction for Evaluation and Tuning of GPGPU Applications. Workshop on Exploiting Parallelism Using GPUs and Other Hardware-Assisted Methods (EPHAM’09), in conjunction with the 2009 International Symposium on Code Generation and Optimization (CGO), Seattle, Washington DC.
[14]  Hong, S. and Kim, H. (2009) An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness. Proceedings of the 36th Annual International Symposium on Computer Architecture, Austin, 20-24 June 2009, 152-163.
https://doi.org/10.1145/1555754.1555775
[15]  Schaa, D. and Kaeli, D. (2009) Exploring the Multiple-GPU Design Space. Proceedings of the 2009 IEEE International Parallel & Distributed Processing Symposium, Rome, 23-29 May 2009, 1-12.
https://doi.org/10.1109/IPDPS.2009.5161068
[16]  Guo, P. and Wang, L.-Q. (2012) Accurate CUDA Performance Modeling for Sparse Matrix-Vector Multiplication. Proceeding of the 2012 International Conference on High Performance Computing and Simulation (HPCS), Madrid, 2-6 July 2012, 496-502.
https://doi.org/10.1109/HPCSim.2012.6266964
[17]  Davis, T. and Hu, Y. (2016) The University of Florida Sparse Matrix Collection.
http://www.cise.ufl.edu/research/sparse/matrices
[18]  Nvidia Corporation (2011) NVIDIA CUDA Programming Guide. Santa Clara, Nvidia Corporation, USA.
[19]  Saad, Y. (2003) Iterative Methods for Sparse Linear Systems. Society for Industrial Applied Mathematics, Philadelphia.
https://doi.org/10.1137/1.9780898718003

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133