A Fast Radix-4 Floating-Point Divider with Quotient Digit Selection by Comparison Multiples (original) (raw)

A novel implementation of radix-4 floating-point division/square-root using comparison multiples

Computers & Electrical Engineering, 2010

ABSTRACT A new implementation for minimally redundant radix-4 floating-point SRT div/sqrt (division/square-root) with the recurrence in the signed-digit format is introduced. The implementation is developed based on the comparison multiples idea. In the proposed approach, the magnitude of the quotient (root) digit is calculated by comparing the truncated partial remainder with 2 limited precision multiples of the divisor (partial root). The digit sign is determined by investigating the polarity of the truncated partial remainder. A timing evaluation using the logical synthesis (Synopsys DC with Artisan 0.18 μm typical library) shows a latency of 2.5 ns for the recurrence of the proposed div/sqrt. This is less than of the conventional implementation.

Radix 2 division with over-redundant quotient selection

IEEE Transactions on Computers, 1997

In this paper we present a new radix 2 division algorithm that uses a recurrence employing simple 3-to-2 digit carry-free adders to perform carry-free addition/subtraction for computing the partial remainders in radix 2 signed-digit form. The quotient digit, during any iteration of the division recursion, is generated from the two mostsignificant radix 2 digits of the partial remainder and independent of the divisor in over-redundant radix 2 digit form (i.e., with digits which belong to the digit set {−2, −1, 0, +1, +2}). The over-redundant quotient digits are then converted to the conventional radix 2 digits (belonging to the set {−1, 0, +1}) by using a reduction technique. This division algorithm is well suited for IEEE 754 standard operands belonging to the range [1, 2) and is slightly faster than previously proposed radix 2 designs (such as the radix 2 SRT), which do not employ input scaling, since the quotient selection for such algorithms is a function of more than two most-significant radix 2 digits of the partial remainder. In comparison with the designs that employ input scaling, the proposed design although slightly slower saves hardware required for scaling purposes.

A radix-16 SRT division unit with speculation of the quotient digits

Proceedings Ninth Great Lakes Symposium on VLSI, 1999

The speed of a divider based on a digit-recurrence algorithm depends mainly on the latency of the quotient digit generation function. In this paper we present an analytical approach that extends the theory developed for standard SRT division and permits to implement division schemes where a simpler function speculates the quotient digit. This leads to division units with shorter cycle time and variable latency since a speculation error may be produced and a post-correction of the quotient may be necessary. We have applied our algorithm to the design of a radix-16 speculative divider for double precision floating point numbers, that resulted to be faster than analogous implementations.

A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture

IEEE Transactions on Computers, 2007

In this work, we present a radix-10 division unit that is based on the digit-recurrence algorithm. The previous decimal division designs do not include recent developments in the theory and practice of this type of algorithm, which were developed for radix-2 k dividers. In addition to the adaptation of these features, the radix-10 quotient digit is decomposed into a radix-2 digit and a radix-5 digit in such a way that only five and two times the divisor are required in the recurrence. Moreover, the most significant slice of the recurrence, which includes the selection function, is implemented in radix-2, avoiding the additional delay introduced by the radix-10 carry-save additions and allowing the balancing of the paths to reduce the cycle delay. The results of the implementation of the proposed radix-10 division unit show that its latency is close to that of radix-16 division units (comparable dynamic range of significands) and it has a shorter latency than a radix-10 unit based on the Newton-Raphson approximation.

A radix-10 SRT divider based on alternative BCD codings

2007 25th International Conference on Computer Design, 2007

In this paper we present the algorithm and architecture of a radix-10 floating-point divider based on an SRT nonrestoring digit-by-digit algorithm. The algorithm uses conventional techniques developed to speed-up radix-2 k division such as signed-digit (SD) redundant quotient and digit selection by constant comparison using a carry-save estimate of the partial remainder. To optimize area and latency for decimal, we include novel features such as the use of alternative BCD codings to represent decimal operands, estimates by truncation at any binary position inside a decimal digit, a single customized fast carry propagate decimal adder for partial remainder computation, initial odd multiple generation and final normalization with rounding, and register placement to exploit advanced high fanin muxlatch circuits. The rough area-delay estimations performed show that the proposed divider has a similar latency but less hardware complexity (1.3 area ratio) than a recently published high performance digit-by-digit implementation. * A. Vázquez and E. Antelo supported in part by the Ministry of Science and Technology of Spain under contract TIN2004-07797-C02.

Improve High Performance Factor for Floating Point SRT Division

2011

The execution performances of the Sweeney, Robertson, Tocher (SRT) division algorithm depend on two parameters: the radix and the redundancy factor. In this paper, a study of the effect of these parameters on the division performances is presented. At each iteration, the SRT algorithm performs a multiplication by the quotient digit .This last can be just a simple shift, if the digit is a power of two otherwise; the SRT iteration needs a multiplier. We propose, in this work, an approach to circumvent this multiplication by decomposing the quotient digit into two or three terms multiples of 2. Then, the multiplication is carried out by simple shifts and a carry save addition. The implementation of this approach on Vertex-II field-programmable gate-array (FPGA) circuits gives best performances than the approach which uses the embedded multipliers 18 x 18 bits. The iterations delays are operands sizes independent. The reduction tree delays are at most equivalent to the delay of two Vert...

Simple radix-4 division with operands scaling

IEEE Transactions on Computers, 1990

Abstmct-A radix-4 division algorithm with operands senling is proposed. The algorithm uses a recurrence with redundant addition (camsave or signed-digit) and combines simple scaling with a quotient-digit selection function that depends only on the estimate of the partial remainder and is independent of the divisor. The scheme results in a signillcant speedup with respect to both the radix-2 and radix-4 without scaling.

Analysis of Fast Radix-10 Digit Recurrence Algorithms for Fixed-Point and Floating-Point Dividers on FPGAs

International Journal of Reconfigurable Computing, 2013

Decimal floating point operations are important for applications that cannot tolerate errors from conversions between binary and decimal formats, for instance, commercial, financial, and insurance applications. In this paper we present five different radix-10 digit recurrence dividers for FPGA architectures. The first one implements a simple restoring shift-and-subtract algorithm, whereas each of the other four implementations performs a nonrestoring digit recurrence algorithm with signed-digit redundant quotient calculation and carry-save representation of the residuals. More precisely, the quotient digit selection function of the second divider is implemented fully by means of a ROM, the quotient digit selection function of the third and fourth dividers are based on carry-propagate adders, and the fifth divider decomposes each digit into three components and requires neither a ROM nor a multiplexer. Furthermore, the fixed-point divider is extended to support IEEE 754-2008 complian...

CMOS Implementation of a hybrid radix-4 divider

Solid-State Circuits …, 1994

A 1.2 ¿m CMOS combinational implementation of a new hybrid radix-4 division algorithm is presented. The algorithm is named hybrid because the dividend, the quotient, and the remainder are represented using the signed-digit-set {2,1,0,1,2}; while the divisor is represented using the conventional digit-set {0, 1, 2, 3}. The divider requires the divisor Y to be pre-scaled to the range 1