%0 Journal Article %T Partitioning Strategies for Load Balancing Subgraph-centric Distributed Graph Processing %A Neel Choudhury %A Ravikant Dindokar %A Akshay Dixit %A Yogesh Simmhan %J Computer Science %D 2015 %I arXiv %X Distributed graph processing platforms have helped emerging application domains use commodity clusters and Clouds to analyze large graphs. Vertex-centric programming models like Google Pregel, and their subgraph-centric variants, specify data-parallel application logic for a single vertex or component that execute iteratively. The locality and balancing of components within partitions affects the performance of such platforms. We propose three partitioning strategies for a subgraph-centric model, and analyze their impact on CPU utilization, communication, iterations, and makespan. We analyze these using Breadth First Search and PageRank algorithms on powerlaw and spatio-planar graphs. They are validated on a commodity cluster using our GoFFish subgraph-centric platform, and compared against Apache Giraph vertex-centric platform. Our experiments show upto 8 times improvement in utilization resulting to upto 5 times improvement of overall makespan for flat and hierarchical partitioning over the default strategy due to improved machine utilization. Further, these also exhibit better horizontal scalability relative to Giraph. %U http://arxiv.org/abs/1508.04265v1