[basic.extended.fp] (original) (raw)

6 Basics [basic]

6.8 Types [basic.types]

6.8.3 Optional extended floating-point types [basic.extended.fp]

If the implementation supports an extended floating-point type ([basic.fundamental]) whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary16, then the typedef-name std::float16_tis declared in the header and names such a type, the macro __STDCPP_FLOAT16_T__ is defined ([cpp.predefined]), and the floating-point literal suffixes f16 and F16are supported ([lex.fcon]).

If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary32, then the typedef-name std::float32_tis declared in the header and names such a type, the macro __STDCPP_FLOAT32_T__ is defined, and the floating-point literal suffixes f32 and F32 are supported.

If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary64, then the typedef-name std::float64_tis declared in the header and names such a type, the macro __STDCPP_FLOAT64_T__ is defined, and the floating-point literal suffixes f64 and F64 are supported.

If the implementation supports an extended floating-point type whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary128, then the typedef-name std::float128_tis declared in the header and names such a type, the macro __STDCPP_FLOAT128_T__ is defined, and the floating-point literal suffixes f128 and F128 are supported.

If the implementation supports an extended floating-point type with the properties, as specified by ISO/IEC 60559, of radix (b) of 2, storage width in bits (k) of 16, precision in bits (p) of 8, maximum exponent (emax) of 127, and exponent field width in bits (w) of 8, then the typedef-name std::bfloat16_tis declared in the header and names such a type, the macro __STDCPP_BFLOAT16_T__ is defined, and the floating-point literal suffixes bf16 and BF16 are supported.

[Note 1:

A summary of the parameters for each type is given in Table 15 .

The precision p includes the implicit 1 bit at the beginning of the significand, so the storage used for the significand is bits.

ISO/IEC 60559 does not assign a name for a type having the parameters specified for std::bfloat16_t.

— _end note_]

Table 15 — Properties of named extended floating-point types [tab:basic.extended.fp]

🔗Parameter	float16_t	float32_t	float64_t	float128_t	bfloat16_t
🔗ISO/IEC 60559 name	binary16	binary32	binary64	binary128
🔗k, storage width in bits	16	32	64	128	16
🔗p, precision in bits	11	24	53	113	8
🔗emax, maximum exponent	15	127	1023	16383	127
🔗w, exponent field width in bits	5	8	11	15	8

Recommended practice: Any names that the implementation provides for the extended floating-point types described in this subsection that are in addition to the names declared in the header should be chosen to increase compatibility and interoperability with the interchange types_Float16, _Float32, _Float64, and _Float128defined in ISO/IEC TS 18661-3 and with future versions of ISO/IEC 9899.