[basic.extended.fp] (original) (raw)

6 Basics [basic]

6.8 Types [basic.types]

6.8.3 Optional extended floating-point types [basic.extended.fp]

If the implementation supports an extended floating-point type ([basic.fundamental]) whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary16, then the typedef-name std​::​float16_tis declared in the header and names such a type, the macro __STDCPP_FLOAT16_T__ is defined ([cpp.predefined]), and the floating-point literal suffixes f16 and F16are supported ([lex.fcon]).

If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary32, then the typedef-name std​::​float32_tis declared in the header and names such a type, the macro __STDCPP_FLOAT32_T__ is defined, and the floating-point literal suffixes f32 and F32 are supported.

If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary64, then the typedef-name std​::​float64_tis declared in the header and names such a type, the macro __STDCPP_FLOAT64_T__ is defined, and the floating-point literal suffixes f64 and F64 are supported.

If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary128, then the typedef-name std​::​float128_tis declared in the header and names such a type, the macro __STDCPP_FLOAT128_T__ is defined, and the floating-point literal suffixes f128 and F128 are supported.

If the implementation supports an extended floating-point type with the properties, as specified by ISO/IEC 60559, of radix (b) of 2, storage width in bits (k) of 16, precision in bits (p) of 8, maximum exponent (emax) of 127, and exponent field width in bits (w) of 8, then the typedef-name std​::​bfloat16_tis declared in the header and names such a type, the macro __STDCPP_BFLOAT16_T__ is defined, and the floating-point literal suffixes bf16 and BF16 are supported.

[Note 1:

A summary of the parameters for each type is given in Table 15.

The precision p includes the implicit 1 bit at the beginning of the significand, so the storage used for the significand is bits.

ISO/IEC 60559 does not assign a name for a type having the parameters specified for std​::​bfloat16_t.

— _end note_]

Table 15 — Properties of named extended floating-point types [tab:basic.extended.fp]

🔗Parameter float16_t float32_t float64_t float128_t bfloat16_t
🔗ISO/IEC 60559 name binary16 binary32 binary64 binary128
🔗k, storage width in bits 16 32 64 128 16
🔗p, precision in bits 11 24 53 113 8
🔗emax, maximum exponent 15 127 1023 16383 127
🔗w, exponent field width in bits 5 8 11 15 8

Recommended practice: Any names that the implementation provides for the extended floating-point types described in this subsection that are in addition to the names declared in the header should be chosen to increase compatibility and interoperability with the interchange types_Float16, _Float32, _Float64, and _Float128defined in ISO/IEC TS 18661-3 and with future versions of ISO/IEC 9899.