%0 Journal Article %T Montgomery Modular Multiplication on Reconfigurable Hardware: Systolic versus Multiplexed Implementation %A Guilherme Perin %A Daniel Gomes Mesquita %A Jo£¿o Baptista Martins %J International Journal of Reconfigurable Computing %D 2011 %I Hindawi Publishing Corporation %R 10.1155/2011/127147 %X This paper describes a comparison of two Montgomery modular multiplication architectures: a systolic and a multiplexed. Both implementations target FPGA devices. The modular multiplication is employed in modular exponentiation processes, which are the most important operations of some public-key cryptographic algorithms, including the most popular of them, the RSA. The proposed systolic architecture presents a high-radix implementation with a one-dimensional array of Processing Elements. The multiplexed implementation is a new alternative and is composed of multiplier blocks in parallel with the new simplified Processing Elements, and it provides a pipelined operation mode. We compare the time ¡Á area efficiency for both architectures as well as an RSA application. The systolic implementation can run the 1024 bits RSA decryption process in just 3.23£¿ms, and the multiplexed architecture executes the same operation in 4.36£¿ms, but the second approach saves up to 28% of logical resources. These results are competitive with the state-of-the-art performance. 1. Introduction Modular multiplication is widely employed in public-key cryptography, especially where modular exponentiation is essential. For instance, the most commonly used asymmetric cryptographic algorithm is the RSA [1]. The RSA security depends on the difficulty of factoring large numbers. Here, large numbers mean prime numbers of up to 4096£¿bits, used as cryptographic keys. In this cryptosystem the main operation is the modular exponentiation using the public and private keys, the first to encrypt and the second to decrypt messages. So, the performance of the whole system depends on the efficiency of modular arithmetic implementations. As modular operations are time consuming, it is common to use hardware devices to perform both the modular multiplication and the exponentiation. Among the hardware approaches, the increased use of reconfigurable devices to implement cryptographic operations, especially the FPGAs, is evident. One of the most suitable methods for performing modular multiplications in hardware is the Montgomery multiplication [2]. This algorithm is fast and power efficient in hardware implementations. Assuming the modular multiplication as , the Montgomery multiplication avoids the division by by replacing the division by right shifts. Also, this method allows the use of multi-precision arithmetic, which is useful for employing high-radix operations. High-radix operations in turn make it easier to develop modular multiplication architectures. Aiming to implement RSA systems based on %U http://www.hindawi.com/journals/ijrc/2011/127147/