A new RSA cryptosystem hardware design based on Montgomery's algorithm (original) (raw)

Montgomery modular multiplier architectures and hardware implementations for an RSA cryptosystem

2003

This paper describes and analyses the Montgomery Multiplication Algorithm and proposes two scalable, systolic architectures and hardware implementations based on this algorithm in order to be used for an RSA module. The Conventional Architecture uses the original version of Montgomery Multiplication Algorithm and the Optimized Architecture a modified version of the algorithm. The second architecture is considerably better than the first one Both architectures follow Carry -Save redundant logic and in comparison with other known architecture give interesting results in term of clock frequency, Multiplication time and chip Covered area..

Fast Montgomery modular multiplication and RSA cryptographic processor architectures

The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, 2003

New, generic silicon architectures for implementing Montgomery's multiplication algorithm are presented. These use Carry Save Adders (CSAs) to perform the large word length additions required by this algorithm when used for RSA encryption and decryption. It is shown that using a four-to-two CSA with two extra registers rather than a five-to-two CSA leads to a useful reduction in the critical path of the multiplier, albeit at the expense of a small increase in circuitry. For operand lengths of 1536-bits and greater, the percentage gain in data throughput rate outweighs the percentage increase in silicon area. Moreover, for a 2048-bit operand length, typical of what is required in many future generation applications, the gain in data throughput is 27.9% compared with a 9.9% increase in area. The practical application of this approach has been demonstrated by applying this to the design of RSA processor architectures with 512-bit and 1024-bit key sizes. The resulting Montgomery multiplier and RSA processor performance results presented are the fastest reported to date in the literature.

Efficient RSA Crypto Processor Using Montgomery Multiplier in FPGA

Advances in Intelligent Systems and Computing, 2019

With the advancement of technology in data communication, security plays a major role in protecting user’s data from adversaries. Cryptography is a technique which consists of various algorithms to provide secure communication during data transfer. One of the most widely used and highly secure algorithm is RSA (Rivest, Adi Shamir and Leonard Adleman). This research is going to focus on designing an efficient crypto processor for RSA on Nexys4, a ready-to-use FPGA (Field Programmable Gate Array) board. There are various techniques in implementing RSA algorithm in hardware platforms. Primary concentration is to implement the algorithm in a high performant manner, and it will be achieved using Montgomery multiplication technique. RSA is a public key cryptography system which involves generation of public key for encryption and private key for decryption. Building blocks of RSA includes: Two multiplier blocks, two blocks to verify the primality of random numbers, one GCD (Greatest Commo...

FPGA Implementation of Modified Serial Montgomery Modular Multiplication for 2048-bit RSA Cryptosystems

RSA (Rivest, Shamir, Adleman) is one of the most widely used cryptographic algorithms worldwide to perform data encryption and decryption. An essential step in RSA computation lies on its modular multiplication which is relatively expensive and time consuming to be implemented in hardware. This paper proposes two modular multiplication architectures based on modified serial montgomery algorithm for 2048-bit RSA. By limiting the integer modulo that has sequence of A094358, a very simple and fast modular multiplication hardware can be developed. The first archictecture which incorporates 2048-bit adders performes better in term of latency (19010 Logic Cells, 2048 clock cycles or 0.0022 s), while the second architecture utilizing multiple smaller 128-bit adders offers less area consumption (8926 Logic Cells, 36864 clock cycles or 0.0031 s). An area multiplied with squared latency (AT 2) can be used as trade-off parameter for choosing the most suitable design for certain need. For prototyping purpose, we have successfully synthesized and implemented our proposed designs written in VHDL using Altera Quartus II with Cyclone II EP2C70F896C6 FPGA as a target board.

A Survey on Hardware Architectures for Montgomery Modular Multiplication Algorithm

The motivation for studying high speed and space efficient algorithms for modular multiplications comes from their applications in publickey cryptography. The Montgomery multiplication algorithm speeds up the modular multiplications and squaring required for the exponentiation. This paper presents new architectures for the computation of modular multiplication and exponentiation using Montgomery multiplication (MM) algorithm. Montgomery modular multiplication (MMM) is one of the fundamental operations used in cryptographic algorithms such as RSA, Diffie-Hellman key distribution and Elliptic Curve cryptosystems. In this paper we compare the new hardware architectures that are able to perform Montgomery multiplication.

High-Speed High-Throughput VLSI Architecture for RSA Montgomery Modular Multiplication with Efficient Format Conversion

Journal of The Institution of Engineers (India): Series B, 2019

Modular multiplication is a key operation in RSA cryptosystems. Modular multipliers can be realized using Montgomery algorithm. Montgomery algorithm employing carry save adders makes modular multiplication suitable and efficient. Montgomery modular multiplication can be carried out in two ways. All the operands are kept in carry save form in one of the ways. The input and output are kept in binary form, and intermediate operands are kept in carry save form in the other way which requires an efficient format converter. This paper proposes a fast and high-throughput Montgomery modular multiplier which employs an efficient format conversion method. Format conversion is carried out through a format conversion unit which consists of a carry look-ahead unit and multiplexer unit. In addition, this multiplier merges two iterations, which reduces the number of clock cycles significantly. Merger of iteration requires integer multiples of inputs which is computed using the same format converter. Critical path delay of the multiplier is minimized by multiplying one of the inputs by four which simplifies necessary intermediate calculations. The total time required for one complete multiplication is significantly minimized due to reduction in required number of clock cycles with optimum critical path delay. Experimental results show that the proposed multiplier achieves significant speed and throughput improvement as compared to previous designs.

Hardware Implementation of Montgomery Modular Multiplication Algorithm Using Iterative Architecture

Modular multiplication is an integral part of RSA cryptosystems and its performance heavily determines the performance of the encryption hardware. This paper provides a hardware implementation of Montgomery's modular multiplication algorithm using iterative architecture. The propsed design is implemented in Verilog HDL and simulated functionally using ModelSim Altera 10.1E. The synthesis is performed using Altera Quartus II 9.1 with target FPGA board Altera DE2-70. The proposed design consumes 17540 logic elements with 15480 LUT and takes 2048 clock cycles to perform multiplication process. Based on trade-off parameter AT 2 measure, the proposed design offers the best performance among other designs.

Pipelined VLSI Architecture for RSA Based on Montgomery Modular Multiplication

Modular multiplication forms a key operation in many public key cryptosystems. Montgomery Multiplication is one of the well-known algorithms to carry out the modular multiplication more quickly. Carry Save Adders are employed to avoid carry propagation at each addition operation. To reduce the extra clock cycles, Configurable carry save adder either with one full-adder or two half-adders can be employed. In addition to that, a mechanism used to skip the unnecessary carry-save addition operations in the one-level CCSA while maintaining the short critical path delay had been developed. In the proposed architecture, maximum worst case delay is analyzed to enhance the throughput. In the path, additional buffers are introduced so that the clock is synchronized to reduce the worst case delay. As a result, pipelining concept is introduced which increases the speed and achieves a high throughput. The pipelined architecture is applied in RSA public key algorithm to increase the throughput of RSA cryptosystem.

Reducing Runtime of RSA Processors Based On High-Radix Montgomery Multipliers.

International Journal of Engineering Sciences & Research Technology, 2013

Depends on various requirements the paper presents & optimized Rivest processor which satisfies circuit area, operating time. we also introduces 3 multiplier based data path using different intermediate data forms: 1) single form, 2) wide variety of arithmetic components. A total of 242 datapaths for 1024 radix. We can reduce the RSA runtime up to 0.24ms. As a result, the faste in less than 1.0 ms.

Montgomery and RNS for RSA Hardware Implementation

Computing and Informatics / Computers and Artificial Intelligence, 2010

There are many architectures for RSA hardware implementation which improve its performance. Two main methods for this purpose are Montgomery and RNS. These are fast methods to convert plaintext to ciphertext in RSA algorithm with hardware implementation. RNS is faster than Montgomery but it uses more area. The goal of this paper is to compare these two methods based on the speed and on the used area. For this purpose the architecture that has a better performance for each method is selected, and some modification is done to enhance their performance. This comparison can be used to select the proper method for hardware implementation in both FPGA and ASIC design.