[basic.extended.fp] (original) (raw)
6 Basics [basic]
6.8 Types [basic.types]
6.8.3 Optional extended floating-point types [basic.extended.fp]
If the implementation supports an extended floating-point type ([basic.fundamental]) whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary16, then the typedef-name std::float16_tis declared in the header and names such a type, the macro __STDCPP_FLOAT16_T__ is defined ([cpp.predefined]), and the floating-point literal suffixes f16 and F16are supported ([lex.fcon]).
If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary32, then the typedef-name std::float32_tis declared in the header and names such a type, the macro __STDCPP_FLOAT32_T__ is defined, and the floating-point literal suffixes f32 and F32 are supported.
If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary64, then the typedef-name std::float64_tis declared in the header and names such a type, the macro __STDCPP_FLOAT64_T__ is defined, and the floating-point literal suffixes f64 and F64 are supported.
If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary128, then the typedef-name std::float128_tis declared in the header and names such a type, the macro __STDCPP_FLOAT128_T__ is defined, and the floating-point literal suffixes f128 and F128 are supported.
If the implementation supports an extended floating-point type with the properties, as specified by ISO/IEC 60559, of radix (b) of 2, storage width in bits (k) of 16, precision in bits (p) of 8, maximum exponent (emax) of 127, and exponent field width in bits (w) of 8, then the typedef-name std::bfloat16_tis declared in the header and names such a type, the macro __STDCPP_BFLOAT16_T__ is defined, and the floating-point literal suffixes bf16 and BF16 are supported.
[Note 1:
A summary of the parameters for each type is given in Table 15.
The precision p includes the implicit 1 bit at the beginning of the significand, so the storage used for the significand is bits.
ISO/IEC 60559 does not assign a name for a type having the parameters specified for std::bfloat16_t.
— _end note_]
Table 15 — Properties of named extended floating-point types [tab:basic.extended.fp]
🔗Parameter | float16_t | float32_t | float64_t | float128_t | bfloat16_t |
---|---|---|---|---|---|
🔗ISO/IEC 60559 name | binary16 | binary32 | binary64 | binary128 | |
🔗k, storage width in bits | 16 | 32 | 64 | 128 | 16 |
🔗p, precision in bits | 11 | 24 | 53 | 113 | 8 |
🔗emax, maximum exponent | 15 | 127 | 1023 | 16383 | 127 |
🔗w, exponent field width in bits | 5 | 8 | 11 | 15 | 8 |
Recommended practice: Any names that the implementation provides for the extended floating-point types described in this subsection that are in addition to the names declared in the header should be chosen to increase compatibility and interoperability with the interchange types_Float16, _Float32, _Float64, and _Float128defined in ISO/IEC TS 18661-3 and with future versions of ISO/IEC 9899.