Physics  2013 

Implementation of the twisted mass fermion operator in the QUDA library

We discuss an extension of the QUDA library for the Wilson twisted mass operator. A performance analysis is presented for both degenerate and non-degenerate flavor doublets. The degenerate twisted mass fermion operator runs at up to 190, 487 and 856 Gflops, for double, single and half precisions respectively on recent NVIDIA Kepler GPUs, while our implementation for the non-degenerate flavor doublet allows to reach 163, 516 and 879 GFlops, respectively. The code is currently in production for the hadron structure study.


