Handbook of Floating-Point Arithmetic (original) (raw)
Related papers
Acta Numerica
Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more.In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designi...
Precision Arithmetic: A New Floating-Point Arithmetic
2006
A new deterministic floating-point arithmetic called precision arithmetic is developed to track precision for arithmetic calculations. It uses a novel rounding scheme to avoid excessive rounding error propagation of conventional floating-point arithmetic. Unlike interval arithmetic, its uncertainty tracking is based on statistics and the central limit theorem, with a much tighter bounding range. Its stable rounding error distribution is approximated by a truncated normal distribution. Generic standards and systematic methods for validating uncertainty-bearing arithmetics are discussed. The precision arithmetic is found to be better than interval arithmetic in both uncertainty-tracking and uncertainty-bounding for normal usages. The precision arithmetic is available publicly at this http URL.
Floating-point on-line arithmetic: Algorithms
1981 IEEE 5th Symposium on Computer Arithmetic (ARITH), 1981
For effective application of on-line arithmetic to practical numerical problems, floating-point algorithms for on-line addition/subtraction and multiplication have been implemented by introducing the notion of quasi-normalization. Those proposed are normalized fixed-precision FLPOL (floating-point on-line) algorithms.
IJERT-Double Precision Floating Point Arithmetic Unit Implementation- A Review
International Journal of Engineering Research and Technology (IJERT), 2015
https://www.ijert.org/double-precision-floating-point-arithmetic-unit-implementation-a-review https://www.ijert.org/research/double-precision-floating-point-arithmetic-unit-implementation-a-review-IJERTV4IS070766.pdf Arithmetic circuits form an important class of circuits in digital systems. Since the invention of FPGAs, the increase in their size and performance has allowed designers to use FPGAs for more complex designs. Very large number and very small number is very hard to represent by fixed point unit so these values can be represented using the IEEE-754 standard based floating point representation. Floating point unit is a unit which is used to perform various mathematical operations such as addition, subtraction etc. This paper reviewed various floating point arithmetic unit implementations by using IEEE-754 standard.
Hardware Implementation of Floating-Point Arithmetic
Springer eBooks, 2018
C HAPTER has shown that operations on floating-point numbers are naturally expressed in terms of integer or fixed-point operations on the significand and the exponent. For instance, to obtain the product of two floating-point numbers, one basically multiplies the significands and adds the exponents. However, obtaining the correct rounding of the result may require considerable design effort and the use of nonarithmetic primitives such as leading-zero counters and shifters. This chapter details the implementation of these algorithms in hardware, using digital logic. Describing in full detail all the possible hardware implementations of the needed integer arithmetic primitives is much beyond the scope of this book. The interested reader will find this information in the textbooks on the subject [345, 483, 187]. After an introduction to the context of hardware floating-point implementation in Section 8.1, we just review these primitives in Section 8.2, discuss their cost in terms of area and delay, and then focus on wiring them together in the rest of the chapter. We assume in this chapter that inputs and outputs are encoded according to the IEEE 754-2008 Standard for Floating-Point Arithmetic.
A New Floating Point Arithmetic with Error Tracking Capability
2006
A new modeless floating-point arithmetic called the precision arithmetic is developed to track, limit, and reject the accumulation of calculation errors during floating point calculations, by reinterpreting the polymorphic representation of the conventional floating point arithmetic. The validity of this strategy is demonstrated by tracking the calculation errors and by rejecting the meaningless results of a few representative algorithms in various conditions. Using this type, each algorithm seems to have a constant error ratio and a constant degradation ratio regardless of input data, and the error in significand seems to propagate very slowly according to a constant exponential distribution specific to the algorithm. In addition, the unfaithful artifact of discrete Fourier transformation is discussed.
Implementation of The Ieee Standard Binary Floating-Point Arithmetic Unit
A floating-point system can be used to represent, with a fixed number of digits, numbers of different orders of magnitude: e.g. the distance between galaxies can be expressed with the same unit of length. The result of this dynamic range is that the numbers that can be represented are not uniformly spaced; the difference between two consecutive representable numbers grows with the chosen scale. Over the years, a variety of floating-point representations have been used in computers. However, since the 1990s, the most commonly encountered representation is that defined by the IEEE 754 Standard. The speed of floating-point operations, commonly measured in terms of FLOPS, is an important characteristic of a computer system. Hence, in this work the IEEE standard binary floating-point arithmetic is implemented.