%0 Journal Article %T Multi GPU Performance of Conjugate Gradient Algorithm with Staggered Fermions %A Hyung-Jin Kim %A Weonjong Lee %J Physics %D 2010 %I arXiv %X We report results of the performance test of GPUs obtained using the conjugate gradient (CG) algorithm for staggered fermions on the MILC fine lattice ($28^3 \times 96$). We use GPUs of nVIDIA GTX 295 model for the test. When we turn off the MPI communication and use only a single GPU, the performance is 35 giga flops in double precision, which corresponds to 47% of the peak. When we turn on the MPI communication and use multi-GPUs, the performance is reduced down to 12.3 giga flops. The data transfer through the infiniband network and PCI-E bus I/O is a main bottle neck. We suggest two potential solutions of how to optimize the data transfer. %U http://arxiv.org/abs/1010.4782v2