Approaches for the performance increasing of software implementation of integer multiplication in prime fields (original) (raw)
Related papers
Techniques for Performance Improvement of Integer Multiplication in Cryptographic Applications
Mathematical Problems in Engineering, 2014
The problem of arithmetic operations performance in number fields is actively researched by many scientists, as evidenced by significant publications in this field. In this work, we offer some techniques to increase performance of software implementation of finite field multiplication algorithm, for both 32-bit and 64-bit platforms. The developed technique, called “delayed carry mechanism,” allows to preventing necessity to consider a significant bit carry at each iteration of the sum accumulation loop. This mechanism enables reducing the total number of additions and applies the modern parallelization technologies effectively.
Approaches for the Parallelization of Software Implementation of Integer Multiplication
In this paper there are considered several approaches for the increasing performance of software implementation of integer multiplication algorithm for the 32-bit & 64-bit platforms via parallelization. The main idea of algorithm parallelization consists in delayed carry mechanism using which authors have proposed earlier. The delayed carry allows to get rid of connectivity in loop iterations for sums accumulation of products, which allows parallel execution of loops iterations in separate threads. Upon completion of sum accumulation threads, it is necessary to make corrections in final result via assimilation of carries. First approach consists in optimization of parallelization for the two execution threads and second approach is an evolution of the first approach and is oriented on three and more execution threads. Proposed approaches for parallelization allow increasing the total algorithm computational complexity, as for one execution thread, but decrease total execution time o...
EFFICIENT IMPLEMENTATION OF BIT PARALLEL FINITE FIELD MULTIPLIERS
Arithmetic in Finite/Galois field is a major aspect for many applications such as error correcting code and cryptography. Addition and multiplication are the two basic operations in the finite field GF (2 m ).The finite field multiplication is the most resource and time consuming operation. In this paper the complexity (space) analysis and efficient FPGA implementation of bit parallel Karatsuba Multiplier over GF (2 m ) is presented. This is especially interesting for high performance systems because of its carry free property. To reduce the complexity of Classical Multiplier, multiplier with less complexity over GF (2 m ) based on Karatsuba Multiplier is used. The LUT complexity is evaluated on FPGA by using Xilinx ISE 8.1i.Furthermore,the experimental results on FPGAs for bit parallel Karatsuba Multiplier and Classical Multiplier were shown and the comparison table is provided. To the best of our knowledge, the bit parallel karatsuba multiplier consumes least resources among the known FPGA implementation.
Efficient Hardware Implementation of Finite Fields with Applications to Cryptography
Acta Applicandae Mathematicae, 2006
The paper presents a survey of most common hardware architectures for finite field arithmetic especially suitable for cryptographic applications. We discuss architectures for three types of finite fields and their special versions popularly used in cryptography: binary fields, prime fields and extension fields. We summarize algorithms and hardware architectures for finite field multiplication, squaring, addition/subtraction, and inversion for each of these fields. Since implementations in hardware can either focus on high-speed or on area-time efficiency, a careful choice of the appropriate set of architectures has to be made depending on the performance requirements and available area.
Proceedings 14th Annual IEEE International ASIC/SOC Conference (IEEE Cat. No.01TH8558), 2001
ABSTRACT Multiplication in finite fields (Galois fields) is a basic operation for cryptography applications. Recent proposals for elliptic code cryptography, require efficient computation of multiplication in finite fields of type GF(2n) for large values of n (150, 200 bits). Digit-serial multiplier VLSI architectures are an attractive solution, being a compromise between purely parallel and serial ones. A comparative study of digit-serial multiplier VLSI architectures, for fields of type GF(2n), is carried out. Such architectures are reviewed, some further optimisations are proposed, and are then implemented in VHDL (CMOS cell library, 0.35 μm, by ST Microelectronics). Figures of merit like time latency, silicon area and power consumption are evaluated by simulation with Synopsis tools, varying parameters like the size n of the field elements and the size k of the blocks of bits being processed in parallel by the digit-serial architectures
Efficient pipelining for modular multiplication architectures in prime fields
ACM Great Lakes Symposium on VLSI, 2007
This paper presents a pipelined architecture of a modu- lar Montgomery multiplier, which is suitable to be used in public key coprocessors. Starting from a baseline imple- mentation of the Montgomery algorithm, a more compact pipelined version is derived. The design makes use of 16- bit integer multiplication blocks that are available on re- cently manufactured FPGAs. The critical path
A New Look-Up Table Approach for High-Speed Finite Field Multiplication
2011 International Symposium on Electronic System Design, 2011
This paper presents a new high-speed multiplier over (2) m GF based on look-up table (LUT) approach. A straightforward LUT-based multiplication requires a table of size (m x 2 m) bits for the Galois field of order m which is quite large for the fields of large orders recommended by the National Institute of Standards and Technology (NIST). Therefore, in this paper, we propose a digit-serial LUT-based technique, where certain number of operand bits are grouped into digits, and multiplication is performed in serial/parallel manner. We restrict the digit-size to 4 to store only 16 words in the LUT. We have also proposed a digit-parallel design to achieve higher speed than its digit-serial counterpart, which is very much useful for high-speed applications. We have chosen m=233 to satisfy the security requirements in elliptic curve cryptography, but our method can be used for other prime extensions, as well. We have estimated the area-time complexity of our designs in terms of LUT access-time and XOR-delay. The proposed LUTbased implementation will be useful for high-speed applications in elliptic curve cryptography and error control coding.
Efficient Software-Implementation of Finite Fields with Applications to Cryptography
Acta Applicandae Mathematicae, 2006
In this work, we present a survey of efficient techniques for software implementation of finite field arithmetic especially suitable for cryptographic applications. We discuss different algorithms for three types of finite fields and their special versions popularly used in cryptography: Binary fields, prime fields and extension fields. Implementation details of the algorithms for field addition/subtraction, field multiplication, field reduction and field inversion for each of these fields are discussed in detail. The efficiency of these different algorithms depends largely on the underlying microprocessor architecture. Therefore, a careful choice of the appropriate set of algorithms has to be made for a software implementation depending on the performance requirements and available resources. Mathematics Subject Classifications 12-02 • 12E30 • 12E10 Key words field arithmetic • cryptography • efficient implementation • binary field arithmetic • prime field arithmetic • extension field arithmetic • optimal extension fields
A Scalable and Unified Multiplier Architecture for Finite Fields GF(p) and GF(2m
2000
We describe a scalable and unified architecture for a Montgomery multiplication module which operates in both types of finite fields GF (p) and GF (2 m ). The unified architecture requires only slightly more area than that of the multiplier architecture for the field GF (p). The multiplier is scalable, which means that a fixed-area multiplication module can handle operands of any size, and also, the wordsize can be selected based on the area and performance requirements. We utilize the concurrency in the Montgomery multiplication operation by employing a pipelining design methodology. We also describe a scalable and unified adder module to carry out concomitant operations in our implementation of the Montgomery multiplication. The upper limit on the precision of the scalable and unified Montgomery multiplier is dictated only by the available memory to store the operands and internal results, and the module is capable of performing infinite-precision Montgomery multiplication in both types of finite fields.