%0 Journal Article %T Acceleration of tensor %A Ali Karakus %A Kasia £¿wirydowicz %A Noel Chalmers %A Tim Warburton %J The International Journal of High Performance Computing Applications %@ 1741-2846 %D 2019 %R 10.1177/1094342018816368 %X This article is devoted to graphics processing unit (GPU) kernel optimization and performance analysis of three tensor-product operations arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving close to peak performance for these operators requires extensive optimization because of the operators¡¯ properties: low arithmetic intensity, tiered structure, and the need to store intermediate results during the kernel execution. We give a guided overview of optimization strategies and we present a performance model that allows us to compare the efficacy of these optimizations against an empirically calibrated roofline %K Finite element method %K elliptic problem %K hexahedral elements %K matrix¨Cvector product %K GPU tensor operations %K NVIDIA Tesla P100 %U https://journals.sagepub.com/doi/full/10.1177/1094342018816368