%0 Journal Article
%T Reducing communication in parallel graph search algorithms with software caches
%A Laura Carrington
%A Manu Shantharam
%A Pietro Cicotti
%J The International Journal of High Performance Computing Applications
%@ 1741-2846
%D 2019
%R 10.1177/1094342018762510
%X In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a graph search, it is possible to capitalize on the characteristics of small-world networks and cache relevant information of high-degree vertexes. We applied this idea by caching remote vertex ids in a parallel breadth-first search benchmark. Our experiment with different implementations demonstrated significant performance improvements over the reference implementation in several configurations, using 64 to 1024 cores. We proposed a system design in which resources are dedicated exclusively to caching and shared among a set of nodes. Our evaluation demonstrates that this design reduces communication and has the potential to improve performance on large-scale systems in which the communication cost increases significantly with the distance between nodes. We also tested a memcached system as the cache server finding that its generic protocol, which does not match our usage semantics, hinders significantly the potential performance improvements and suggested that a generic system should also support a basic and lightweight communication protocol to meet the needs of high-performance computing applications. Finally, we explored different configurations to find efficient ways to utilize the resources allocated to solve a given problem size; to this extent, we found utilizing half of the compute cores per allocated node improves performance, and even in this case, caching variants always outperform the reference implementation
%K Computer architecture
%K accelerator architectures
%K parallel processing
%K parallel algorithm
%K data system
%K data analysis
%U https://journals.sagepub.com/doi/full/10.1177/1094342018762510