Accelerated modular multiplication algorithm of large word length numbers with a fixed module (original) (raw)
A review of modular multiplication methods and respective hardware implementations
Generally speaking, public-key cryptographic systems consist of raising elements of some group such as GF(2n), Z/NZ or elliptic curves, to large powers and reducing the result modulo some given element. Such operation is often called modular exponentiation and is performed using modular multiplications repeatedly. The practicality of a given cryptographic system depends heavily on how fast modular exponentiations are performed. Consequently, it also depends on how efficiently modular multiplications are done as these are at the base of the computation. This problem has received much attention over the years. Software as well as hardware efficient implementation were proposed. However, the results are scattered through the literature. In this paper we survey most known and recent methods for efficient modular multiplication, investigating and examining their strengths and weaknesses. For each method presented, we provide an adequate hardware implementation. Povzetek: Podan je pregled modernih metod kriptografije.
A New Algorithm for High-Speed Modular Multiplication Design
IEEE Transactions on Circuits and Systems I: Regular Papers, 2009
Modular exponentiation in public-key cryptosystems is usually achieved by repeated modular multiplications on large integers. Designing high-speed modular multiplication is thus very crucial to speed up the decryption/encryption process. In this paper, we first explore how to relax the data dependency that exists between multiplication, quotient determination, and modular reduction in the conventional Montgomery modular multiplication algorithm. Then, we propose a new modular multiplication algorithm for high-speed hardware design. The speed improvement is achieved by reducing the critical path delay from the 4-to-2 to 3-to-2 carry-save addition. The resulting time complexity of our development is further decreased by simultaneously performing the multiplication and modular reduction processes. Experimental results show that the developed modular multiplication can operate at speeds higher than those of related work. When the proposed modular multiplication is applied to modular exponentiation, both time and area-time advantages are obtained.
Fast Modular Multiplication using Parallel Prefix Adder
International Journal of Trend in Scientific Research and Development
Public key cryptography applications involve use of large integer arithmetic operations which are compute intensive in term of power, delay and area. Modular multiplication, which is frequently, used most resource hungry block. Generally, last stage of modular multiplication is implemented by using carry propagate adder whose long carry chain takes more time. In this paper, modulo multiplication architectures using Carry Save and Kogge parallel prefix adder are presented to reduce this problem. Proposed implementations are faster as compared to conventional carry save adder and carry propagate adder implementations.
Review Of Fast Multiplication Algorithms For Embedded Systems Design
This paper presents a review with numerical example and complexity analysis of the fast multiplication algorithms to help embedded system designers to improve the hardware performance for many applications such as cryptosystems design. The paper presented two practical multiplication algorithms: Karatsuba multiplication algorithm with time complexity í µí°(í µí° §) and Schönhage–Strassen multiplication algorithm with the run-time bit complexity defined as í µí°(í µí° § (í µí° §) ((í µí° §)). In addition, interleaved multiplication algorithm can be used efficiently to compute the modular multiplication with logarithmic time complexity which enhances the linear time complexity of Montgomery modular multiplication.
Efficient Implementation of Modular Multiplication using Carry Look Ahead Adder
2013
Data security is an important aspect of information transmission and storage in an electronic form. Cryptographic systems are used to encrypt such information to guarantee its security. To retrieve such information, the encrypted form must be first decrypted. One of the most popular cryptographic systems is the RSA public key crypto system. The larger the RSA public modulus size, the stronger will be the RSA cryptosystem. Unfortunately, the RSA is extremely vulnerable to timing attacks which can deduce the private RSA exponent due to regularity of operations in the straight forward implementation of exponentiation using the square and multiply method or its variants. Timing attacks constitute a major threat to the all systems using RSA and hence, implementations must be protected. The work reported here proposes Secure Implementation of RSA algorithm against timing attacks. This implementation is done using Verilog HDL and targeting Xilinx FPGA devices.
EFFICIENT HARDWARE ARCHITECTURES FOR MODULAR MULTIPLICATION ON FPGAS
The computational fundament of most public-key cryptosystems is the modular multiplication. Improving the efficiency of the modular multiplication is directly associated with the efficiency of the whole cryptosystem. This paper presents an implementation and comparison of three recently proposed, highly efficient architectures for modular multiplication on FPGAs: interleaved modular multiplication and two variants of the Montgomery modular multiplication. This (first) hardware implementation of these designs shows their relative performance regarding area and speed.
Modular Multiplication of Large Integers on FPGA
2005
Public key cryptography often involves modular multiplication of large operands (160 up to 2048 bits). Several researchers have proposed iterative algorithms whose internal data are carry-save numbers. This number system is unfortunately not well suited to today's Field Programmable Gate Arrays (FPGAs) embedding dedicated carry logic.
Fast Modular Multiplication Execution in Residue Number System
— In the paper, we propose a new method of modular multiplication computation, based on Residue Number System. We use an approximate method to find the approximate method a residue from division of a multiplication on the given module. We substitute expensive modular operations, by fast bit right shift operations and taking low bits. The carried-out simulation on Kintex7 XC7K70T board showed that the offered method allows to win in time on average for 75%, and in the area-on average for 80% relatively to modified method from work [1] that makes it more applicable for the hardware implementation of the cryptography primitives constructed over a simple finite field.
A Survey on Hardware Architectures for Montgomery Modular Multiplication Algorithm
The motivation for studying high speed and space efficient algorithms for modular multiplications comes from their applications in publickey cryptography. The Montgomery multiplication algorithm speeds up the modular multiplications and squaring required for the exponentiation. This paper presents new architectures for the computation of modular multiplication and exponentiation using Montgomery multiplication (MM) algorithm. Montgomery modular multiplication (MMM) is one of the fundamental operations used in cryptographic algorithms such as RSA, Diffie-Hellman key distribution and Elliptic Curve cryptosystems. In this paper we compare the new hardware architectures that are able to perform Montgomery multiplication.
Fast integer multiplication using modular arithmetic
Electronic Colloquium on Computational Complexity, 2008
We give anO(N logN 2O(log N) ) algorithm for multiplying twoN-bit integers that improves the O(N logN log logN) algorithm by Schonhage-Strassen (SS71). Both these algorithms use modular arithmetic. Recently, Furer (Fur07) gave an O(N logN 2O(log N) ) algorithm which however uses arithmetic over complex numbers as opposed to modular arithmetic. In this paper, we use multivariate polynomial multiplication
Journal of The Institution of Engineers (India): Series B, 2019
Modular multiplication is a key operation in RSA cryptosystems. Modular multipliers can be realized using Montgomery algorithm. Montgomery algorithm employing carry save adders makes modular multiplication suitable and efficient. Montgomery modular multiplication can be carried out in two ways. All the operands are kept in carry save form in one of the ways. The input and output are kept in binary form, and intermediate operands are kept in carry save form in the other way which requires an efficient format converter. This paper proposes a fast and high-throughput Montgomery modular multiplier which employs an efficient format conversion method. Format conversion is carried out through a format conversion unit which consists of a carry look-ahead unit and multiplexer unit. In addition, this multiplier merges two iterations, which reduces the number of clock cycles significantly. Merger of iteration requires integer multiples of inputs which is computed using the same format converter. Critical path delay of the multiplier is minimized by multiplying one of the inputs by four which simplifies necessary intermediate calculations. The total time required for one complete multiplication is significantly minimized due to reduction in required number of clock cycles with optimum critical path delay. Experimental results show that the proposed multiplier achieves significant speed and throughput improvement as compared to previous designs.
Efficiency and performance review of Montgomery modular multiplication based on VLSI architecture
— Various Modular multiplication techniques are developed for the multiplication of large integers. In this paper the focus is mainly on the Montgomery algorithm for modular multiplication (MM), the basic Montgomery's algorithm that has been modified for improvement in performance and area time product in different designs are reviewed, and the improvements made in the algorithm of each designs are traced in a sequential manner from the initial. This starts with the various alterations in the basic algorithm using carry save adders (CSA) to eliminate the carry propagation that are long in large binary operations and the format conversion issues which will increase required clock cycles in the designs while converting the carry save format to binary representation are reviewed. Future scope of the design modification for reduced delay in critical path and hardware cost when compared with previous designs are discussed.
An Optimized Montgomery Modular Multiplication Algorithm for Cryptography
2013
Montgomery modular multiplication is one of the fundamental operations used in cryptographic algorithms, such as RSA and Elliptic Curve Cryptosystems. The previous Montgomery multipliers perform a single Montgomery multiplication in approximately 2n clock cycles and it requires more number of addition stages for large word length additions, where n is the size of operands in bits. In this paper, new Montgomery modular multiplier is proposed which performs the same operation in approximately n clock cycles with almost same clock period. The proposed multiplier uses carry select adders (CSLAs) to perform large word length additions. Carry select adder is based on the concept of Binary to Excess-1convertor (BEC). The proposed algorithm using the concept of precomputing partial results using two possible assumptions regarding the most significant bit of the previous word. The optimized algorithm is simulated using Xilinx ISE 12.1i and it is implemented using Virtex5 FPGA device. Keyword...
FPGA Implementation of Modular Exponentiation Using Single Modular Multiplier
2014
This paper presents the FPGA implementation of Modular Exponentiation (ME), based on Software/Hardware (SW/HW) approach. Indeed, in Rivest, Shamir and Adleman (RSA) cryptosystem, ME which is computed by series of Modular Multiplications (MM’s), is the main function to encrypt and decrypt data. In order to achieve the best trade-off between area, speed and flexibility, we propose in this work an embedded system, where ME algorithm is managed in SW, using the processor MicroBlaze of Xilinx. The MM is implemented as a HW core around the processor. Because, the MM is usually considered as a critical arithmetic operation, the Montgomery modular multiplication, requiring simple shifts and additions is used to realize the HW architecture of our MM core. The results show that the application to RSA 1024-bits, the execution time of the ME is about 109.5 ms. While, in terms of hardware resources, the device requires 1645 slices. Keywords— Modular Exponentiation, Montgomery Modular Multiplicat...
A modified radix-2 Montgomery modular multiplication with new recoding method
Ieice Electronic Express, 2010
Montgomery modular multiplication algorithm is commonly used in implementations of the RSA cryptosystem and other cryptosystems with modular operations. Radix-2 version of this algorithm is simple and fast in hardware implementations. In this paper this algorithm is modified with a new recoding method to make it simpler and faster. We have also implemented this new algorithm with carry save adders. Results show that, in average the proposed algorithm has about 47% increase of data throughput with maximum 7% increase of hardware area comparing with conventional algorithm.
Efficient Adders to Speedup Modular Multiplication for Cryptography
Modular multiplication is an essential operation in many cryptography arithmetic operations. This work serves the modular multiplication algorithms focusing on improving their underlying binary adders. Different known adders have been considered and studied. The carry-save adder, carry-lookahead adder and carry-skip adder showed interesting features and trade-offs. The adders VHDL implementations gave some more beneficial details promising for improved crypto designs.
IEEE Transactions on Circuits and Systems I: Regular Papers, 2000
Modular exponentiation is the cornerstone computation in public-key cryptography systems such as RSA cryptosystems. The operation is time consuming for large operands. This paper describes the characteristics of three architectures designed to implement modular exponentiation using the fast binary method: the first field-programmable gate array (FPGA) prototype has a sequential architecture, the second has a parallel architecture, and the third has a systolic array-based architecture. The paper compares the three prototypes as well as Blum and Paar's implementation using the time area classic factor. All three prototypes implement the modular multiplication using the popular Montgomery algorithm.
New iterative algorithms for modular multiplication
Signal Processing, 2004
The new modular multiplier structures proposed in this paper are based on a short precision magnitude comparison instead of the full magnitude comparison operation. Another feature of these structures is that the comparison operations are carried out first. Only once this has been achieved that the reduction operation takes place, while in previous work both the comparison and the reduction operations are interleaved. This has resulted in a reduction of the number of stages required for the implementation of the modular reduction operation. Serial implementations have shown that the new radix-2 algorithm has a better area usage than similar structures available in the literature while the proposed radix-4 algorithm exhibits better area usage than similar structures with relatively similar speed performances. The parallel implementation of these algorithms has also shown that the new radix-4 algorithm has the best area usage while its speed performances are similar to that of structures proposed in the literature. r (O. Nibouche), mokhtar.nibouche@uwe.ac.uk (M. Nibouche), a.bouridane@ qub.ac.uk (A. Bouridane).
IEEE Access, 2020
In this paper, an area-time efficient hardware implementation of modular multiplication over five National Institute of Standard and Technology (NIST)-recommended prime fields is proposed for lightweight elliptic curve cryptography (ECC). A modified radix-2 interleaved algorithm is proposed to reduce the time complexity of conventional interleaved modular multiplication. The proposed multiplication algorithm is designed in hardware and separately implemented on Xilinx Virtex-7, Virtex-6, Virtex-5, and Virtex-4 field-programmable gate array (FPGA) platforms. On the Virtex-7 FPGA, the proposed design
Modular Multiplication using Redundant Digit Division
18th IEEE Symposium on Computer Arithmetic (ARITH '07), 2007
Most implementations of the modular exponentiation, M E mod N , computation in cryptographic algorithms employ Montgomery multiplication, ABR −1 mod N , instead of modular multiplication, AB mod N , even the former requires some transformational overheads. This is so because a state-of-the-art Montgomery multiplication implementation has a performance advantage over direct modular multiplication based on the Barrett algorithm that more than compensates for the overhead. In this paper, we present a direct modular multiplication method that is comparable in speed to Montgomery multiplication. One consequence is that when the exponent in small, direct computation (which does not incur the transformational overhead) using the modular multiplication algorithm presented here results in practical performance gain. For the exponent 17, for instance, which requires five modular multiplication, a saving of up to 40% can be achieved.