A radix-10 SRT divider based on alternative BCD codings (original) (raw)
Related papers
A Fast Radix-4 Floating-Point Divider with Quotient Digit Selection by Comparison Multiples
The Computer Journal, 2006
A new implementation for minimally redundant radix-4 SRT division with the recurrence in the signed-digit format is introduced. The implementation is developed based on the comparison multiples idea. In the proposed approach, the quotient digit's magnitude is calculated by comparing the truncated partial remainder with two limited precision multiples of the divisor. The sign is determined by investigating the polarity of the truncated partial remainder. A timing evaluation using logical synthesis shows a latency of 2.34 ns for the recurrence of the proposed divider. It is $22% less than the conventional implementation.
Improve High Performance Factor for Floating Point SRT Division
2011
The execution performances of the Sweeney, Robertson, Tocher (SRT) division algorithm depend on two parameters: the radix and the redundancy factor. In this paper, a study of the effect of these parameters on the division performances is presented. At each iteration, the SRT algorithm performs a multiplication by the quotient digit .This last can be just a simple shift, if the digit is a power of two otherwise; the SRT iteration needs a multiplier. We propose, in this work, an approach to circumvent this multiplication by decomposing the quotient digit into two or three terms multiples of 2. Then, the multiplication is carried out by simple shifts and a carry save addition. The implementation of this approach on Vertex-II field-programmable gate-array (FPGA) circuits gives best performances than the approach which uses the embedded multipliers 18 x 18 bits. The iterations delays are operands sizes independent. The reduction tree delays are at most equivalent to the delay of two Vert...
A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture
IEEE Transactions on Computers, 2007
In this work, we present a radix-10 division unit that is based on the digit-recurrence algorithm. The previous decimal division designs do not include recent developments in the theory and practice of this type of algorithm, which were developed for radix-2 k dividers. In addition to the adaptation of these features, the radix-10 quotient digit is decomposed into a radix-2 digit and a radix-5 digit in such a way that only five and two times the divisor are required in the recurrence. Moreover, the most significant slice of the recurrence, which includes the selection function, is implemented in radix-2, avoiding the additional delay introduced by the radix-10 carry-save additions and allowing the balancing of the paths to reduce the cycle delay. The results of the implementation of the proposed radix-10 division unit show that its latency is close to that of radix-16 division units (comparable dynamic range of significands) and it has a shorter latency than a radix-10 unit based on the Newton-Raphson approximation.
A radix-16 SRT division unit with speculation of the quotient digits
Proceedings Ninth Great Lakes Symposium on VLSI, 1999
The speed of a divider based on a digit-recurrence algorithm depends mainly on the latency of the quotient digit generation function. In this paper we present an analytical approach that extends the theory developed for standard SRT division and permits to implement division schemes where a simpler function speculates the quotient digit. This leads to division units with shorter cycle time and variable latency since a speculation error may be produced and a post-correction of the quotient may be necessary. We have applied our algorithm to the design of a radix-16 speculative divider for double precision floating point numbers, that resulted to be faster than analogous implementations.
Fast Radix-10 Multiplication Using Redundant BCD Codes
IEEE Transactions on Computers, 2014
We present the algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speedup its computation: the redundant BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In addition, new techniques are developed to reduce significantly the latency and area of previous representative highperformance implementations. Partial products are generated in parallel using a signed-digit radix-10 recoding of the BCD multiplier with the digit set [-5, 5], and a set of positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3. This encoding has several advantages. First, it is a self-complementing code, so that a negative multiplicand multiple can be obtained by just inverting the bits of the corresponding positive one. Also, the available redundancy allows a fast and simple generation of multiplicand multiples in a carryfree way. Finally, the partial products can be recoded to the ODDS representation by just adding a constant factor into the partial product reduction tree. Since the ODDS uses a similar 4-bit binary encoding as non-redundant BCD, conventional binary VLSI circuit techniques, such as binary carry-save adders and compressor trees, can be adapted efficiently to perform decimal operations. To show the advantages of our architecture, we have synthesized a RTL model for 16 Â 16-digit and 34 Â 34-digit multiplications and performed a comparative survey of the previous most representative designs. We show that the proposed decimal multiplier has an area improvement roughly in the range 20-35 percent for similar target delays with respect to the fastest implementation.
THE DESIGN AND IMPLEMENTATION OF A HIGH-PERFORMANCE FLOATING-POINT DIVIDER
1994
The increasing computation requirements of modern computer applications have stim- ulated a large interest in developing extremely high-performance floating-point dividers. A variety of division algorithms are available, with SRT being utilized in many computer systems. A careful analysis of SRT divider topologies has demonstrated that a relatively simple divider designed in an aggressive circuit style can achieve extremely high perfor- mance. Further, an aggressive circuit implementation can minimize many of the perfor- mance advantages of more complex divider algorithms. This paper presents the tradeoffs of the different divider topologies, the design of the divider, and performance results.
Radix 2 division with over-redundant quotient selection
IEEE Transactions on Computers, 1997
In this paper we present a new radix 2 division algorithm that uses a recurrence employing simple 3-to-2 digit carry-free adders to perform carry-free addition/subtraction for computing the partial remainders in radix 2 signed-digit form. The quotient digit, during any iteration of the division recursion, is generated from the two mostsignificant radix 2 digits of the partial remainder and independent of the divisor in over-redundant radix 2 digit form (i.e., with digits which belong to the digit set {−2, −1, 0, +1, +2}). The over-redundant quotient digits are then converted to the conventional radix 2 digits (belonging to the set {−1, 0, +1}) by using a reduction technique. This division algorithm is well suited for IEEE 754 standard operands belonging to the range [1, 2) and is slightly faster than previously proposed radix 2 designs (such as the radix 2 SRT), which do not employ input scaling, since the quotient selection for such algorithms is a function of more than two most-significant radix 2 digits of the partial remainder. In comparison with the designs that employ input scaling, the proposed design although slightly slower saves hardware required for scaling purposes.
A novel implementation of radix-4 floating-point division/square-root using comparison multiples
Computers & Electrical Engineering, 2010
ABSTRACT A new implementation for minimally redundant radix-4 floating-point SRT div/sqrt (division/square-root) with the recurrence in the signed-digit format is introduced. The implementation is developed based on the comparison multiples idea. In the proposed approach, the magnitude of the quotient (root) digit is calculated by comparing the truncated partial remainder with 2 limited precision multiples of the divisor (partial root). The digit sign is determined by investigating the polarity of the truncated partial remainder. A timing evaluation using the logical synthesis (Synopsys DC with Artisan 0.18 μm typical library) shows a latency of 2.5 ns for the recurrence of the proposed div/sqrt. This is less than of the conventional implementation.
Fast Radix-10 Multiplication Using Binary Input and Convert into Decimal Codes
2017
We present the algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speedup its computation, the redundant BCD excess-3 code (XS-3) and the overloaded BCD representation (ODDS). The partial products can be recoded to the ODDS representation by just adding a constant factor into the partial product reduction tree. To show the advantages of our architecture, we have synthesized a RTL model for 16×16 digit and 34×34 digit multiplications and performed a comparative survey of the previous most representative designs. New techniques are developed to reduce significantly the latency and area of previous representative high performance implementations. Partial products are generated in parallel using a signed digit radix-10 recoding of the BCD multiplier with the digit set [-5,5] and a set of positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3.