This paper describes a comparison of two Montgomery modular multiplication architectures: a systolic and a multiplexed. Both implementations target FPGA devices. The modular multiplication is employed in modular exponentiation processes, which are the most important operations of some public-key cryptographic algorithms, including the most popular of them, the RSA. The proposed systolic architecture presents a high-radix implementation with a one-dimensional array of Processing Elements. The multiplexed implementation is a new alternative and is composed of multiplier blocks in parallel with the new simplified Processing Elements, and it provides a pipelined operation mode. We compare the time × area efficiency for both architectures as well as an RSA application. The systolic implementation can run the 1024 bits RSA decryption process in just 3.23?ms, and the multiplexed architecture executes the same operation in 4.36?ms, but the second approach saves up to 28% of logical resources. These results are competitive with the state-of-the-art performance. 1. Introduction Modular multiplication is widely employed in public-key cryptography, especially where modular exponentiation is essential. For instance, the most commonly used asymmetric cryptographic algorithm is the RSA [1]. The RSA security depends on the difficulty of factoring large numbers. Here, large numbers mean prime numbers of up to 4096?bits, used as cryptographic keys. In this cryptosystem the main operation is the modular exponentiation using the public and private keys, the first to encrypt and the second to decrypt messages. So, the performance of the whole system depends on the efficiency of modular arithmetic implementations. As modular operations are time consuming, it is common to use hardware devices to perform both the modular multiplication and the exponentiation. Among the hardware approaches, the increased use of reconfigurable devices to implement cryptographic operations, especially the FPGAs, is evident. One of the most suitable methods for performing modular multiplications in hardware is the Montgomery multiplication [2]. This algorithm is fast and power efficient in hardware implementations. Assuming the modular multiplication as , the Montgomery multiplication avoids the division by by replacing the division by right shifts. Also, this method allows the use of multi-precision arithmetic, which is useful for employing high-radix operations. High-radix operations in turn make it easier to develop modular multiplication architectures. Aiming to implement RSA systems based on
References
[1]
R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystems,” Communications of the ACM, vol. 21, no. 2, pp. 120–126, 1978.
[2]
P. Montgomery, “Modular multiplication without trial division,” Mathematics of Computation, vol. 44, pp. 519–521, 1985.
[3]
T. Blum and C. Paar, “Montgomery modular exponentiation on reconfigurable hardware,” in Proceedings of the Symposium on Computer Arithmetic (ARITH '99), pp. 70–77, IEEE Computer Society, Adelaide, Australia, 1999.
[4]
N. Pinckney and D. M. Harris, “Parallelized radix-4 scalable montgomery multipliers,” in Proceedings of the 20th Symposium on Integrated Circuits and System Design (SBCCI '07), pp. 306–311, 2007.
[5]
C. McIvor and J. V. McCanny, “High-radix systolic modular multiplication on reconfigurable hardware,” in Proceedings of the IEEE International Conference on Field Programmable Technology, vol. 2005, pp. 13–18, 2005.
[6]
C. D. Walter, “Systolic modular multiplication,” IEEE Transactions on Computers, vol. 42, no. 3, pp. 376–378, 1993.
[7]
F. Bernard, “Scalable hardware implementing high-radix Montgomery multiplication algorithm,” Journal of Systems Architecture, vol. 53, no. 2-3, pp. 117–126, 2007.
[8]
Y. Wang, D. L. Maskell, and J. Leiwo, “A unified architecture for a public key cryptographic coprocessor,” Journal of Systems Architecture, vol. 54, no. 10, pp. 1004–1016, 2008.
[9]
A. Daly and W. Marnane, “Efficient architectures for implementing montgomery modular multiplication and RSA modular exponentiation on reconfigurable logic,” in Proceedings of the ACM/SIGDA 10th International Symposium on Field Programmable Gate Arrays (FPGA '02), pp. 40–49, 2002.
[10]
D. Knuth, The Art of Computer Programming, Vol. 2: Seminumerical Algorithms, Addison-Wesley, Reading, Mass, USA, 1951.
[11]
Xilinx, “Virtex-5 user guide,” March 2006, http://www.xilinx.com.
[12]
G. Perin, D. G. Mesquita, F. L. Herrmann, and J. B. Martins, “Montgomery modular multiplication on reconfigurable hardware: fully systolic array vs parallel implementation,” in Proceedings of the 6th Southern Programmable Logic Conference (SPL '10), pp. 61–66, 2010.
[13]
A. Menezes, Handbook of Applied Cryptography, CRC Press, New York, NY, USA, 5th edition, 2001.
[14]
H. Orup, “Simplifying quotient determination in high-radix modular multiplication,” in Proceedings of the 12th Symposium on Computer Arithmetic (ARITH '95), pp. 193–199, July 1995.
[15]
C. D. Walter, “Montgomery exponentiation needs no final subtractions,” Electronics Letters, vol. 35, no. 21, pp. 1831–1832, 1999.
[16]
A. F. Tenca and C. K. Ko?, “A scalable architecture for modular multiplication based on montgomery's algorithm,” IEEE Transactions on Computers, vol. 52, no. 9, pp. 1215–1220, 2003.
[17]
D. Harris, R. Krishnamurthy, M. Anders, S. Mathew, and S. Hsu, “An improved unified scalable radix-2 montgomery multiplier,” in Proceedings of the Symposium on Computer Arithmetic (ARITH '05), pp. 172–178, 2005.
[18]
M. Joye and S. M. Yen, “The montgomery powering ladder,” in Proceedings of the 4th International Workshop on Cryptographic Hardware and Embedded Systems (CHES '02), vol. 2523 of Lecture Notes in Computer Science, pp. 291–302, 2003.