High-speed double-precision computation of reciprocal, division, square root, and inverse square root (original) (raw)

An Efficient Hardware Implementation for a Reciprocal Unit

2010 Fifth IEEE International Symposium on Electronic Design, Test & Applications, 2010

The computation of the reciprocal of a numerical value is an important ingredient of many algorithms. We present a compact hardware architecture to compute reciprocals by two or three Newton-Raphson iterations to obtain the accuracy of IEEE 754 single-and double-precision standard, respectively. We estimate the initialization value by a specially designed second-order polynomial approximating the reciprocal. By using a second-order polynomial, we succeed in using one single hardware architecture for both, the polynomialapproximation computations as well as the Newton-Raphson iterations. Therefore, we obtain a most compact hardware implementation for the complete reciprocal computation.

Fast VLSI algorithms for division and square root

Journal of VLSI Signal Processing, 1994

Real time digital signal processing demands high performance implementations of division and square root. This can only be achieved by the design of fast and efficient arithmetic algorithms which address practical VLSI architectural design issues. In this paper, new algorithms for division and square root are described. The new schemes are based on pre-scaling the operands and modifying the classical SRT method such that the result digits and the remainders are computed concurrently and the computations in adjacent rows are overlapped. Consequently, their performance exceeds that of the SRT methods. The hardware cost for higher radices is considerably more than that of the SRT methods but for many applications, this is not prohibitive. A system of equations is presented which enables both an analysis of the method for any radix and the parameters of implementations to be easily determined. This is illustrated for the case of radix 2 and radix 4. In addition, a highly regular array architecture combining the division and square root method is described.

Series approximation methods for divide and square root in the Power3/sup TM/ processor

Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336), 2000

The Power3 processor is a 64-bit implementation of the PowerPC™ architecture and is the successor to the Power2™ processor for workstations and servers which require high performance floating point capability. The previous processors used Newton-Raphson algorithms for their implementations of divide and square root. The Power3 processor has a longer pipeline latency, which would substantially increase the latency for these instructions. Instead, new algorithms based on power series approximations were developed which provide significantly better performance than the Newton-Raphson algorithm for this processor.

Optimized Floating Point Square-root

2018

In present digital world, fast and resource optimized execution of basic mathematical operations such as multiplication, division, square-root etc. play an important role. There are enormous algorithm where it is necessary to calculate square-root. After addition, subtraction, multiplication and division, square-root is most important mathematical operation. Therefore, this paper presents fast, resource optimized, and floating point square-root algorithm. Three different algorithms such as 1) non-restoring algorithm 2) IEEE 754 floating point square-root algorithm and 3) Logarithmic square-root algorithm, are implemented on Xilinxs Spartan 3E and compared for resource utilization and execution clocks. Comparison shows that IEEE 754 floating point square-root algorithm outperforms with the throughput as 50MSPS consuming 60% less resources than logarithmic square-root algorithm.

Simplified floating-point division and square root

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

Digital Signal Processing (DSP) algorithms on lowpower embedded platforms are often implemented using fixed-point arithmetic due to expected power and area savings over floating-point computation. However, recent research shows that floating-point arithmetic can be made competitive by using a reduced-precision format instead of, e.g., IEEE standard single precision, thereby avoiding the algorithm design and implementation difficulties associated with fixed-point arithmetic. This paper investigates the effects of simplified floating-point arithmetic applied to an FMA-based floating-point unit and the associated software division and square root operations. Software operations are proposed which attain near-exact precision with twice the performance of exact algorithms and resolve overflow-related errors with inexpensive exponent-manipulation special instructions.

A novel implementation of radix-4 floating-point division/square-root using comparison multiples

Computers & Electrical Engineering, 2010

ABSTRACT A new implementation for minimally redundant radix-4 floating-point SRT div/sqrt (division/square-root) with the recurrence in the signed-digit format is introduced. The implementation is developed based on the comparison multiples idea. In the proposed approach, the magnitude of the quotient (root) digit is calculated by comparing the truncated partial remainder with 2 limited precision multiples of the divisor (partial root). The digit sign is determined by investigating the polarity of the truncated partial remainder. A timing evaluation using the logical synthesis (Synopsys DC with Artisan 0.18 μm typical library) shows a latency of 2.5 ns for the recurrence of the proposed div/sqrt. This is less than of the conventional implementation.

FPGA Implementation of Low-Area Square Root Calculator

TELKOMNIKA (Telecommunication Computing Electronics and Control), 2015

Square root is one of the mathematical operations which are widely used in digital signal processing. Its implementation on hardware such as FPGA will provide several advantages compare to the performance offered in software. There are several algorithms which can be utilized for this calculation, but they are difficult to be implemented in FPGA. This paper presents a model of FPGA based square root calculator, which requires very low resources usage, thus occupying very low area of FPGA. The model is designed to suit the needs of medium-speed and low-speed applications which don't need very high processing speed, while optimizing the number of resources utilized. The modified non-restoring algorithm is used in this design to compute the square root. The design is coded in RTL VHDL, and implemented in Altera DE2-board for hardware validation. The implementation produced very precise square root calculation, with low latency computation and low area consumption, for various input data width tested.

Novel Pipelined Architecture for Efficient Evaluation of the Square Root Using a Modified Non-Restoring Algorithm

The square root is a basic arithmetic operation in image and signal processing. We present a novel pipelined architecture to implement N-bit fixed-point square root operation on an FPGA using a non-restoring pipelined algorithm that does not require floating-point hardware. Pipelining hazards in its hardware realization are avoided by modifying the classic non-restoring algorithm, thus resulting in a 13% improved latency. Furthermore, the proposed architecture is flexible allowing modification as per individual application needs. It is demonstrated that the proposed architecture is approximately four times faster than its popular counterparts and at the same time it consumes 50% less energy for envelope detection at 268 MHz sampling rate.

High Performance Novel Square Root Architecture Using Ancient Indian Mathematics for High Speed Signal Processing

Advances in Pure Mathematics, 2015

Novel high speed energy efficient square root architecture has been reported in this paper. In this architecture, we have blended ancient Indian Vedic mathematics and Bakhshali mathematics to achieve a significant amount of accuracy in performing the square root operation. Basically, Vedic Duplex method and iterative division method reported in Bakhshali Manuscript have been utilized for that computation. The proposed technique has been compared with the well known Newton-Raphson's (N-R) technique for square root computation. The algorithm has been implemented and tested using Modelsim simulator, and performance parameters such as the number of lookup tables, propagation delay and power consumption have been estimated using Xilinx ISE simulator. The functionality of the circuitry has been checked using Xilinx Virtex-5 FPGA board.

High-level algorithms for correctly-rounded reciprocal square roots

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

We analyze two fast and accurate algorithms recently presented by Borges for computing x −1/2 in binary floating-point arithmetic (assuming that efficient and correctlyrounded FMA and square root are available). The first algorithm is based on the Newton-Raphson iteration, and the second one uses an order-3 iteration. We give attainable relative-error bounds for these two algorithms, build counterexamples showing that in very rare cases they do not provide a correctly-rounded result, and characterize precisely when such failures happen in IEEE 754 binary32 and binary64 arithmetics. We then give a generic (i.e., precision-independent) algorithm that always returns a correctly-rounded result, and show how it can be simplified and made more efficient in the important cases of binary32 and binary64.