[lex.literal] (original) (raw)

5 Lexical conventions [lex]

5.13 Literals [lex.literal]

5.13.2 Integer literals [lex.icon]

octal-digit: one of
0 1 2 3 4 5 6 7

nonzero-digit: one of
1 2 3 4 5 6 7 8 9

hexadecimal-prefix: one of
0x 0X

hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F

unsigned-suffix: one of
u U

long-long-suffix: one of
ll LL

In an integer-literal, the sequence ofbinary-digits,octal-digits,digits, orhexadecimal-digit_s_is interpreted as a base N integer as shown in Table 7; the lexically first digit of the sequence of digits is the most significant.

[Note 1:

The prefix and any optional separating single quotes are ignored when determining the value.

β€” _end note_]

The hexadecimal-digits a through f and A through Fhave decimal values ten through fifteen.

[Example 1:

The number twelve can be written 12, 014,0XC, or 0b1100.

The integer-literals 1048576,1'048'576, 0X100000, 0x10'0000, and0'004'000'000 all have the same value.

β€” _end example_]

The type of an integer-literal is the first type in the list in Table 8corresponding to its optional integer-suffixin which its value can be represented.

Table 8 β€” Types of integer-literals [tab:lex.icon.type]

πŸ”—integer-suffix decimal-literal integer-literal other than decimal-literal
πŸ”—none int int
πŸ”— long int unsigned int
πŸ”— long long int long int
πŸ”— unsigned long int
πŸ”— long long int
πŸ”— unsigned long long int
πŸ”—u or U unsigned int unsigned int
πŸ”— unsigned long int unsigned long int
πŸ”— unsigned long long int unsigned long long int
πŸ”—l or L long int long int
πŸ”— long long int unsigned long int
πŸ”— long long int
πŸ”— unsigned long long int
πŸ”—Both u or U unsigned long int unsigned long int
πŸ”—and l or L unsigned long long int unsigned long long int
πŸ”—ll or LL long long int long long int
πŸ”— unsigned long long int
πŸ”—Both u or U unsigned long long int unsigned long long int
πŸ”—and ll or LL
πŸ”—z or Z the signed integer type corresponding the signed integer type
πŸ”— to std​::​size_t ([support.types.layout]) corresponding to std​::​size_t
πŸ”— std​::​size_t
πŸ”—Both u or U std​::​size_t std​::​size_t
πŸ”—and z or Z

Except for integer-literals containing a size-suffix, if the value of an integer-literalcannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type.

If all of the types in the list for the integer-literalare signed, the extended integer type is signed.

If all of the types in the list for the integer-literalare unsigned, the extended integer type is unsigned.

If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.

If an integer-literalcannot be represented by any of the allowed types, the program is ill-formed.

[Note 2:

An integer-literal with a z or Z suffix is ill-formed if it cannot be represented by std​::​size_t.

β€” _end note_]

5.13.3 Character literals [lex.ccon]

encoding-prefix: one of
u8 u U L

basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character

simple-escape-sequence-char: one of
' " ? \ a b f n r t v

conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x

A multicharacter literal is a character-literalwhose c-char-sequence consists of more than one c-char.

A multicharacter literal shall not have an encoding-prefix.

If a multicharacter literal contains a c-charthat is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.

Multicharacter literals are conditionally-supported.

The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequenceas defined by Table 9.

Table 9 β€” Character literals [tab:lex.ccon.literal]

πŸ”—Encoding Kind Type Associated char- Example
πŸ”—prefix acter encoding
πŸ”—none ordinary character literal char ordinary literal 'v'
πŸ”— multicharacter literal int encoding 'abcd'
πŸ”—L wide character literal wchar_t wide literal L'w'
πŸ”— encoding
πŸ”—u8 UTF-8 character literal char8_t UTF-8 u8'x'
πŸ”—u UTF-16 character literal char16_t UTF-16 u'y'
πŸ”—U UTF-32 character literal char32_t UTF-32 U'z'

In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.

A multicharacter literal has animplementation-defined value.

The value of any other kind of character-literalis determined as follows:

The character specified by a simple-escape-sequenceis specified in Table 10.

[Note 1:

Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.

β€” _end note_]

Table 10 β€” Simple escape sequences [tab:lex.ccon.esc]

πŸ”—character simple-escape-sequence
πŸ”—U+000a line feed \n
πŸ”—U+0009 character tabulation \t
πŸ”—U+000b line tabulation \v
πŸ”—U+0008 backspace \b
πŸ”—U+000d carriage return \r
πŸ”—U+000c form feed \f
πŸ”—U+0007 alert \a
πŸ”—U+005c reverse solidus \\
πŸ”—U+003f question mark \?
πŸ”—U+0027 apostrophe \'
πŸ”—U+0022 quotation mark \"

5.13.4 Floating-point literals [lex.fcon]

floating-point-suffix: one of
f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16

[Note 1:

The floating-point suffixesf16, f32, f64, f128, bf16,F16, F32, F64, F128, and BF16are conditionally-supported.

β€” _end note_]

Table 11 β€” Types of floating-point-literals [tab:lex.fcon.type]

πŸ”—floating-point-suffix type
πŸ”—none double
πŸ”—f or F float
πŸ”—l or L long double
πŸ”—f16 or F16 std​::​float16_t
πŸ”—f32 or F32 std​::​float32_t
πŸ”—f64 or F64 std​::​float64_t
πŸ”—f128 or F128 std​::​float128_t
πŸ”—bf16 or BF16 std​::​bfloat16_t

The significand of a floating-point-literalis the fractional-constant or digit-sequenceof a decimal-floating-point-literalor the hexadecimal-fractional-constantor hexadecimal-digit-sequenceof a hexadecimal-floating-point-literal.

In the significand, the sequence of digits or hexadecimal-digit_s_and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal.

[Note 2:

Any optional separating single quotes are ignored when determining the value.

β€” _end note_]

If an exponent-part or binary-exponent-partis present, the exponent e of the floating-point-literalis the result of interpreting the sequence of an optional sign and the digit_s_as a base 10 integer.

Otherwise, the exponent e is 0.

The scaled value of the literal is for a decimal-floating-point-literal and for a hexadecimal-floating-point-literal.

[Example 1:

The floating-point-literals 49.625 and 0xC.68p+2 have the same value.

The floating-point-literals 1.602'176'565e-19 and 1.602176565e-19have the same value.

β€” _end example_]

If the scaled value is not in the range of representable values for its type, the program is ill-formed.

Otherwise, the value of a floating-point-literalis the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

5.13.5 String literals [lex.string]

basic-s-char:
any member of the translation character set except the U+0022 quotation mark,
U+005c reverse solidus, or new-line character

r-char:
any member of the translation character set, except a U+0029 right parenthesis followed by
the initial d-char-sequence (which may be empty) followed by a U+0022 quotation mark

d-char:
any member of the basic character set except:
U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line

The kind of a string-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding prefix and sequence ofs-chars or r-char_s_as defined by Table 12where n is the number of encoded code units that would result from an evaluation of the string-literal(see below).

Table 12 β€” String literals [tab:lex.string.literal]

πŸ”—Enco- Kind Type Associated Examples
πŸ”—ding character
πŸ”—prefix encoding
πŸ”—none ordinary string literal array of nconst char ordinary literal encoding "ordinary string"R"(ordinary raw string)"
πŸ”—L wide string literal array of nconst wchar_t wide literalencoding L"wide string"LR"w(wide raw string)w"
πŸ”—u8 UTF-8 string literal array of nconst char8_t UTF-8 u8"UTF-8 string"u8R"x(UTF-8 raw string)x"
πŸ”—u UTF-16 string literal array of nconst char16_t UTF-16 u"UTF-16 string"uR"y(UTF-16 raw string)y"
πŸ”—U UTF-32 string literal array of nconst char32_t UTF-32 U"UTF-32 string"UR"z(UTF-32 raw string)z"

A string-literal that has an R in the prefix is a raw string literal.

Thed-char-sequence serves as a delimiter.

The terminatingd-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.

A d-char-sequenceshall consist of at most 16 characters.

[Note 1:

The characters '(' and ')' can appear in araw-string.

Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)".

β€” _end note_]

[Note 2:

A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.

Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);

β€” _end note_]

[Example 1:

The raw stringR"a( )\ a" )a" is equivalent to "\n)\\\na\"\n".

The raw stringR"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\"".

β€” _end example_]

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.

The string-literals in any sequence of adjacent string-literal_s_shall have at most one unique encoding-prefix among them.

The common encoding-prefix of the sequence is that encoding-prefix, if any.

[Note 3:

A string-literal's rawness has no effect on the determination of the common encoding-prefix.

β€” _end note_]

In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated.

The lexical structure and grouping of the contents of the individual string-literals is retained.

[Example 2:

"\xA" "B" represents the code unit '\xA' and the character 'B'after concatenation (and not the single code unit '\xAB').

Similarly,R"(\u00)" "41" represents six characters, starting with a backslash and ending with the digit 1(and not the single character 'A'specified by a universal-character-name).

Table 13 has some examples of valid concatenations.

β€” _end example_]

Table 13 β€” String literal concatenations [tab:lex.string.concat]

πŸ”—Source Means Source Means Source Means
πŸ”—u"a" u"b" u"ab" U"a" U"b" U"ab" L"a" L"b" L"ab"
πŸ”—u"a" "b" u"ab" U"a" "b" U"ab" L"a" "b" L"ab"
πŸ”—"a" u"b" u"ab" "a" U"b" U"ab" "a" L"b" L"ab"

Evaluating a string-literal results in a string literal object with static storage duration ([basic.stc]).

[Note 4:

String literal objects are potentially non-unique ([intro.object]).

Whether successive evaluations of astring-literal yield the same or a different object is unspecified.

β€” _end note_]

[Note 5:

The effect of attempting to modify a string literal object is undefined.

β€” _end note_]

String literal objects are initialized with the sequence of code unit values corresponding to the string-literal's sequence ofs-chars (originally from non-raw string literals) andr-chars (originally from raw string literals), plus a terminating U+0000 null character, in order as follows:

5.13.6 Unevaluated strings [lex.string.uneval]

An unevaluated-string shall have no encoding-prefix.

An unevaluated-string is never evaluated and its interpretation depends on the context in which it appears.

5.13.7 Boolean literals [lex.bool]

boolean-literal:
false
true

The Boolean literals are the keywords false and true.

Such literals have type bool.

5.13.8 Pointer literals [lex.nullptr]

The pointer literal is the keyword nullptr.

It has typestd​::​nullptr_t.

[Note 1:

std​::​nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value.

β€” _end note_]

5.13.9 User-defined literals [lex.ext]

If a token matches both user-defined-literal and another literal kind, it is treated as the latter.

[Example 1:

123_kmis a user-defined-literal, but 12LL is aninteger-literal.

β€” _end example_]

The syntactic non-terminal preceding the ud-suffix in auser-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.

A user-defined-literal is treated as a call to a literal operator or literal operator template ([over.literal]).

To determine the form of this call for a given user-defined-literal _L_with ud-suffix X, first let S be the set of declarations found by unqualified lookup for the literal-operator-idwhose literal suffix identifier is X ([basic.lookup.unqual]).

S shall not be empty.

If L is a user-defined-integer-literal, let n be the literal without its ud-suffix.

If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the formoperator ""X(_n_ULL)

Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.

If S contains a raw literal operator, the literal L is treated as a call of the formoperator ""X("n")

Otherwise (S contains a numeric literal operator template),L is treated as a call of the formoperator ""X<'', '', ... ''>() where n is the source character sequence .

[Note 1:

The sequence can only contain characters from the basic character set.

β€” _end note_]

If L is a user-defined-floating-point-literal, let f be the literal without its ud-suffix.

If S contains a literal operator with parameter type long double, the literal L is treated as a call of the formoperator ""X(_f_L)

Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.

If S contains a raw literal operator, the literal L is treated as a call of the formoperator ""X("f")

Otherwise (S contains a numeric literal operator template),L is treated as a call of the formoperator ""X<'', '', ... ''>() where f is the source character sequence .

[Note 2:

The sequence can only contain characters from the basic character set.

β€” _end note_]

If L is a user-defined-string-literal, let str be the literal without its ud-suffixand let len be the number of code units in str(i.e., its length excluding the terminating null character).

If S contains a literal operator template with a constant template parameter for which str is a well-formed template-argument, the literal L is treated as a call of the formoperator ""X<_str_>()

Otherwise, the literal L is treated as a call of the formoperator ""X(str, len)

If L is a user-defined-character-literal, let ch be the literal without its ud-suffix.

S shall contain a literal operator whose only parameter has the type of ch and the literal L is treated as a call of the formoperator ""X(ch)

[Example 2: long double operator ""_w(long double); std::string operator ""_w(const char16_t*, std::size_t);unsigned operator ""_w(const char*);int main() { 1.2_w; u"one"_w; 12_w; "two"_w; } β€” _end example_]

In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated anduser-defined-string-literals are considered string-literals for that purpose.

During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string].

At the end of phase 6, if a string-literal is the result of a concatenation involving at least oneuser-defined-string-literal, all the participatinguser-defined-string-literals shall have the same ud-suffixand that suffix is applied to the result of the concatenation.

[Example 3: int main() { L"A" "B" "C"_x; "P"_x "Q" "R"_y; } β€” _end example_]