All Title Author
Keywords Abstract

Publish in OALib Journal
ISSN: 2333-9721
APC: Only $99

ViewsDownloads

Relative Articles

More...

Autotuning Strategies For Reducing Synchronization Costs In Multithreaded Kernels

Keywords: Compiler Design , Parallelism , Software Infrastructure

Full-Text   Cite this paper   Add to My Lib

Abstract:

Emergence of multicore architectures has opened up new opportunities for thread-level parallelism and dramatically increased the theoretical peak on current systems. However, achieving a high fraction of peak performance requires careful orchestration of many architecture-sensitive parameters, both on-chip and across the interconnect. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to thread synchronization and data locality. This paper studies the complex interaction among several compiler-level code transformations that affect data locality, achieved parallelism and synchronization and communication costs. We characterize this interaction using static analysis and generate a search space suitable for efficient automatic performance tuning. We also develop a heuristic based on number of threads; data reuse patterns, and the size and configuration of the shared cache, to estimate the optimal synchronization interval for pipeline-parallelized code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific kernels on four different multicore platforms. The results show that our proposed heuristic is able to estimate the optimal synchronization window with reasonable accuracy and able to achieve significant performance improvement.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413