全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Autotuning Strategies For Reducing Synchronization Costs In Multithreaded Kernels

Keywords: Compiler Design , Parallelism , Software Infrastructure

Full-Text   Cite this paper   Add to My Lib

Abstract:

Emergence of multicore architectures has opened up new opportunities for thread-level parallelism and dramatically increased the theoretical peak on current systems. However, achieving a high fraction of peak performance requires careful orchestration of many architecture-sensitive parameters, both on-chip and across the interconnect. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to thread synchronization and data locality. This paper studies the complex interaction among several compiler-level code transformations that affect data locality, achieved parallelism and synchronization and communication costs. We characterize this interaction using static analysis and generate a search space suitable for efficient automatic performance tuning. We also develop a heuristic based on number of threads; data reuse patterns, and the size and configuration of the shared cache, to estimate the optimal synchronization interval for pipeline-parallelized code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific kernels on four different multicore platforms. The results show that our proposed heuristic is able to estimate the optimal synchronization window with reasonable accuracy and able to achieve significant performance improvement.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413