Numeric Considerations for Native Floating-Point - MATLAB & Simulink (original) (raw)

Main Content

Native floating-point technology can generate HDL code from your floating-point design. Floating-point designs have better precision, a higher dynamic range, and a shorter development cycle than fixed-point designs. If your design has complex mathematical operations, use native floating-point technology.

HDL Coder™ generates code that complies with the IEEE® 754 standard for floating-point arithmetic. HDL Coder native floating-point supports:

Nearest Even Digit Rounding

HDL Coder native floating-point supports rounding to the nearest even digit. This mode resolves all ties by rounding to the nearest even digit.

This rounding method requires at least three trailing bits after the 23 bits of the mantissa. The MSB is called Guard bit, the middle bit is called the Round bit, and the LSB is called the Sticky bit. The table shows the rounding action that HDL Coder performs based on different values of the three trailing bits.x denotes a don’t care value and can take either a 0 or a 1.

Rounding bits Rounding Action
0xx No action performed.
100 A tie. If the mantissa bit that precedes the Guard bit is a 1, round up, otherwise no action is performed.
101 Round up.
11x Round up.

Denormal Numbers

Denormal numbers are numbers that have an exponent field equal to zero and a nonzero mantissa field. The leading bit of the mantissa is zero.

Denormal numbers have magnitudes less than the smallest floating-point number that can be represented without leading zeros in the mantissa. The presence of denormal numbers indicates a loss of significant digits that can accumulate over multiple operations and result in unexpected values.

The logic that HDL Coder uses to handle denormal numbers involves counting the number of leading zeros and performing a left shift operation to obtain the normalized representation. Addition of this logic increases the area footprint on the target device and can affect the timing of your design.

When you use native floating-point support, you can specify your design handles denormal numbers.

Exception Handling

If you perform operations such as division by zero or computing the logarithm of a negative number, HDL Coder detects and reports exceptions. This table summarizes the mapping from the encoding of a floating-point number to the value of the number for different exceptions. Anx denotes a don’t care value, which can be a 0 or 1 without affecting the mapping.

Sign Exponent Significand Value Description
x 0xFF 0x00000000 value=(−1)S∞ Infinity
x 0xFF A nonzero value value = NaN Not a Number
x 0x00 0x00000000 value = 0 Zero
x 0x00 A nonzero value value=(−1)sign*(0+Σ23i=1b23−i2−i)*2−126 Denormal
x 0x00 < E < 0xFF x value=(−1)sign*(1+Σ23i=1b23−i2−i)*2(e−127) Normal

Relative Accuracy and ULP Considerations

The representation of infinite real numbers with a finite number of bits requires an approximation. This approximation can result in rounding errors in floating-point computation. To measure the rounding errors, the floating-point standard uses a relative error and a ULP error.

ULP

If the exponent range is not upper-bounded, a ULP value of a floating-point number_x_ is the distance between the two closest straddling floating-point numbers a and b nearest to x. The IEEE 754 standard requires that the result of an elementary arithmetic operation such as addition, multiplication, or division is correctly rounded. A correctly rounded result means that the rounded result is within 0.5 ULP of the exact result.

ULP value of 1 means adding a 1 to the decimal value of the stored integer. This table shows the approximation of π to nine decimal digits and how the ULP value of 1 changes the approximate value.

Floating-point number Decimal Value of Stored Integer IEEE-754 representation for Single Types
3.141592741 1078530011 0|10000000
3.141592979 1078530012 0|10000000

The gap between two consecutively representable floating-point numbers varies according to magnitude.

Floating-point number Decimal Value of Stored Integer IEEE-754 representation for Single Types
1234567 1234613304 0|10010011
1234567.125 1234613305 0|10010011

Relative Error

Relative error measures the relative difference between a floating-point number and the approximation of the real number. The relative error between the real numbers_a_ and b is the ratio of absolute difference between numbers a and b to the maximum of_a_ and b.

This table shows the relative error between two consecutive floating-point values that has a ULP value of 1.

Floating-point number Decimal Value of Stored Integer IEEE-754 representation for Single Types Relative error
1234567 1234613304 0|10010011 00101101011010000111000
1234567.125 1234613305 0|10010011 00101101011010000111001

See Also

Modeling Guidelines

Functions

Topics