j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-lanes hashing on SIMD architectures, by coding a 4-lanes-SHA-256 implementation and measuring its performance on the latest 3rd Generation IntelR CoreTM. For messages whose lengths range from 2 KB to 132 KB, we show that the 4-lanes SHA-256 is between 1.5 to 1.97 times faster than the fastest publicly available implementation that we are aware of, and between ~2 to ~2.5 times faster than the OpenSSL 1.0.1c implementation. For long messages, there is no significant performance difference between different choices of j. We show that the 4-lanes SHA-256 is faster than the two SHA3 finalists (BLAKE and Keccak) that have a published tree mode implementation. Finally, we explain why j-lanes hashing will be faster on the coming AVX2 architecture that facilitates using 256 bits registers. These results suggest that standardizing a tree mode for hash functions (SHA-256 in particular) could be useful for performance hungry applications.
j-lanes tree hashing is a tree mode
that splits an input message into j slices, computes j independent digests of each slice, and
outputs the hash value of their concatenation. j-pointers tree hashing is a
similar tree mode that receives, as input, j pointers to j messages (or slices of a single message),
computes their digests and outputs the hash value of their concatenation. Such
modes expose parallelization opportunities in a hashing process that is
otherwise serial by nature. As a result, they have a performance advantage on
modern processor architectures. This paper provides precise specifications for
these hashing modes, proposes appropriate IVs, and demonstrates their
performance on the latest processors. Our hope is that it would be useful for standardization
of these modes.