[lex.literal] (original) (raw)
5 Lexical conventions [lex]
5.13 Literals [lex.literal]
5.13.2 Integer literals [lex.icon]
integer-literal: binary-literal integer-suffix octal-literal integer-suffix decimal-literal integer-suffix hexadecimal-literal integer-suffix
binary-literal: 0b binary-digit 0B binary-digit binary-literal ' binary-digit
octal-literal: 0 octal-literal ' octal-digit
decimal-literal: nonzero-digit decimal-literal ' digit
hexadecimal-literal: hexadecimal-prefix hexadecimal-digit-sequence
binary-digit: one of 0 1
octal-digit: one of 0 1 2 3 4 5 6 7
nonzero-digit: one of 1 2 3 4 5 6 7 8 9
hexadecimal-prefix: one of 0x 0X
hexadecimal-digit-sequence: hexadecimal-digit hexadecimal-digit-sequence ' hexadecimal-digit
hexadecimal-digit: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
integer-suffix: unsigned-suffix long-suffix unsigned-suffix long-long-suffix long-suffix unsigned-suffix long-long-suffix unsigned-suffix
unsigned-suffix: one of u U
long-suffix: one of l L
long-long-suffix: one of ll LL
In an integer-literal, the sequence ofbinary-digits,octal-digits,digits, orhexadecimal-digits is interpreted as a base N integer as shown in table Table 7; the lexically first digit of the sequence of digits is the most significant.
[ Note
:
The prefix and any optional separating single quotes are ignored when determining the value.
— end note
]
Table 7: Base of integer-literals [tab:lex.icon.base]
| Kind of integer-literal | base N |
|---|---|
| binary-literal | 2 |
| octal-literal | 8 |
| decimal-literal | 10 |
| hexadecimal-literal | 16 |
The hexadecimal-digitsa through f and A through Fhave decimal values ten through fifteen.
[ Example
:
The number twelve can be written 12, 014,0XC, or 0b1100.
The integer-literals 1048576,1'048'576, 0X100000, 0x10'0000, and0'004'000'000 all have the same value.
— end example
]
The type of an integer-literal is the first type in the list in Table 8corresponding to its optional integer-suffixin which its value can be represented.
An integer-literal is a prvalue.
Table 8: Types of integer-literals [tab:lex.icon.type]
| integer-suffix | decimal-literal | integer-literal other than decimal-literal |
|---|---|---|
| none | int | int |
| long int | unsigned int | |
| long long int | long int | |
| unsigned long int | ||
| long long int | ||
| unsigned long long int | ||
| u or U | unsigned int | unsigned int |
| unsigned long int | unsigned long int | |
| unsigned long long int | unsigned long long int | |
| l or L | long int | long int |
| long long int | unsigned long int | |
| long long int | ||
| unsigned long long int | ||
| Both u or U | unsigned long int | unsigned long int |
| and l or L | unsigned long long int | unsigned long long int |
| ll or LL | long long int | long long int |
| unsigned long long int | ||
| Both u or U | unsigned long long int | unsigned long long int |
| and ll or LL |
If an integer-literalcannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type.
If all of the types in the list for the integer-literalare signed, the extended integer type shall be signed.
If all of the types in the list for the integer-literalare unsigned, the extended integer type shall be unsigned.
If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.
A program is ill-formed if one of its translation units contains an integer-literalthat cannot be represented by any of the allowed types.
5.13.3 Character literals [lex.ccon]
character-literal: encoding-prefix ' c-char-sequence '
encoding-prefix: one of u8 u U L
c-char-sequence: c-char c-char-sequence c-char
c-char: any member of the basic source character set except the single-quote ', backslash , or new-line character escape-sequence universal-character-name
escape-sequence: simple-escape-sequence octal-escape-sequence hexadecimal-escape-sequence
simple-escape-sequence: one of ' " ? \ \a \b \f \n \r \t \v
octal-escape-sequence: \ octal-digit \ octal-digit octal-digit \ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence: \x hexadecimal-digit hexadecimal-escape-sequence hexadecimal-digit
A character-literal that does not begin withu8, u, U, or Lis an ordinary character literal.
An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.
An ordinary character literal that contains more than one c-char is amulticharacter literal.
A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character set, is conditionally-supported, has type int,and has an implementation-defined value.
A character-literal that begins with u8, such as u8'w',is a character-literal of type char8_t, known as a UTF-8 character literal.
The value of a UTF-8 character literal is equal to its ISO/IEC 10646 code point value, provided that the code point value can be encoded as a single UTF-8 code unit.
[ Note
:
That is, provided the code point value is in the range (hexadecimal).
— end note
]
If the value is not representable with a single UTF-8 code unit, the program is ill-formed.
A UTF-8 character literal containing multiple c-chars is ill-formed.
A character-literal that begins with the letter u, such as u'x',is a character-literal of type char16_t, known as a UTF-16 character literal.
The value of a UTF-16 character literal is equal to its ISO/IEC 10646 code point value, provided that the code point value is representable with a single 16-bit code unit.
[ Note
:
That is, provided the code point value is in the range (hexadecimal).
— end note
]
If the value is not representable with a single 16-bit code unit, the program is ill-formed.
A UTF-16 character literal containing multiple c-chars is ill-formed.
A character-literal that begins with the letter U, such as U'y',is a character-literal of type char32_t, known as a UTF-32 character literal.
The value of a UTF-32 character literal containing a single c-char is equal to its ISO/IEC 10646 code point value.
A UTF-32 character literal containing multiple c-chars is ill-formed.
A character-literal that begins with the letter L, such as L'z',is a wide-character literal.
A wide-character literal has typewchar_t.17
The value of a wide-character literal containing a singlec-char has value equal to the numerical value of the encoding of the c-char in the execution wide-character set, unless thec-char has no representation in the execution wide-character set, in which case the value is implementation-defined.
[ Note
:
The type wchar_t is able to represent all members of the execution wide-character set (see [basic.fundamental]).
— end note
]
The value of a wide-character literal containing multiple c-chars isimplementation-defined.
Certain non-graphic characters, the single quote ', the double quote ", the question mark ?,18and the backslash\, can be represented according to Table 9.
The double quote " and the question mark ?, can be represented as themselves or by the escape sequences\" and \? respectively, but the single quote ' and the backslash \shall be represented by the escape sequences \' and\\ respectively.
Escape sequences in which the character following the backslash is not listed in Table 9 are conditionally-supported, with implementation-defined semantics.
An escape sequence specifies a single character.
Table 9: Escape sequences [tab:lex.ccon.esc]
| new-line | NL(LF) | \n |
|---|---|---|
| horizontal tab | HT | \t |
| vertical tab | VT | \v |
| backspace | BS | \b |
| carriage return | CR | \r |
| form feed | FF | \f |
| alert | BEL | \a |
| backslash | \ | \\ |
| question mark | ? | \? |
| single quote | ' | \' |
| double quote | " | \" |
| octal number | ooo | \ooo |
| hex number | hhh | \xhhh |
The escape\ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character.
The escape\xhhhconsists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character.
There is no limit to the number of digits in a hexadecimal sequence.
A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively.
The value of a character-literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for character-literals with no prefix) orwchar_t (for character-literals prefixed by L).
[ Note
:
If the value of a character-literal prefixed byu, u8, or Uis outside the range defined for its type, the program is ill-formed.
— end note
]
A universal-character-name is translated to the encoding, in the appropriate execution character set, of the character named.
If there is no such encoding, the universal-character-name is translated to animplementation-defined encoding.
[ Note
:
In translation phase 1, a universal-character-name is introduced whenever an actual extended character is encountered in the source text.
However, the actual compiler implementation may use its own native character set, so long as the same results are obtained.
— end note
]
5.13.4 Floating-point literals [lex.fcon]
floating-point-literal: decimal-floating-point-literal hexadecimal-floating-point-literal
decimal-floating-point-literal: fractional-constant exponent-part floating-point-suffix digit-sequence exponent-part floating-point-suffix
hexadecimal-floating-point-literal: hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-point-suffix hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-point-suffix
fractional-constant: digit-sequence . digit-sequence digit-sequence .
hexadecimal-fractional-constant: hexadecimal-digit-sequence . hexadecimal-digit-sequence hexadecimal-digit-sequence .
exponent-part: e sign digit-sequence E sign digit-sequence
binary-exponent-part: p sign digit-sequence P sign digit-sequence
sign: one of + -
digit-sequence: digit digit-sequence ' digit
floating-point-suffix: one of f l F L
The type of a floating-point-literal is determined by its floating-point-suffix as specified in Table 10.
Table 10: Types of floating-point-literals [tab:lex.fcon.type]
| floating-point-suffix | type |
|---|---|
| none | double |
| f or F | float |
| l or L | long double |
The significand of a floating-point-literalis the fractional-constant or digit-sequenceof a decimal-floating-point-literalor the hexadecimal-fractional-constantor hexadecimal-digit-sequenceof a hexadecimal-floating-point-literal.
In the significand, the sequence of digits or hexadecimal-digits and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal.
[ Note
:
Any optional separating single quotes are ignored when determining the value.
— end note
]
If an exponent-part or binary-exponent-partis present, the exponent e of the floating-point-literalis the result of interpreting the sequence of an optional sign and the digits as a base 10 integer.
Otherwise, the exponent e is 0.
The scaled value of the literal is for a decimal-floating-point-literal and for a hexadecimal-floating-point-literal.
[ Example
:
The floating-point-literals 49.625 and 0xC.68p+2 have the same value.
The floating-point-literals 1.602'176'565e-19 and 1.602176565e-19have the same value.
— end example
]
If the scaled value is not in the range of representable values for its type, the program is ill-formed.
Otherwise, the value of a floating-point-literalis the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.
5.13.5 String literals [lex.string]
string-literal: encoding-prefix " s-char-sequence " encoding-prefix R raw-string
s-char-sequence: s-char s-char-sequence s-char
s-char: any member of the basic source character set except the double-quote ", backslash , or new-line character escape-sequence universal-character-name
raw-string: " d-char-sequence ( r-char-sequence ) d-char-sequence "
r-char-sequence: r-char r-char-sequence r-char
r-char: any member of the source character set, except a right parenthesis ) followed by the initial d-char-sequence (which may be empty) followed by a double quote ".
d-char-sequence: d-char d-char-sequence d-char
d-char: any member of the basic source character set except: space, the left parenthesis (, the right parenthesis ), the backslash , and the control characters representing horizontal tab, vertical tab, form feed, and newline.
A string-literal that has an R in the prefix is a raw string literal.
Thed-char-sequence serves as a delimiter.
The terminatingd-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.
A d-char-sequenceshall consist of at most 16 characters.
[ Note
:
The characters '(' and ')' are permitted in araw-string.
Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)".
— end note
]
[ Note
:
A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.
Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:
const char* p = R"(a
b
c)";
assert(std::strcmp(p, "a\\nb\nc") == 0);
— end note
]
[ Example
:
The raw string
R"a(
)
a"
)a"
is equivalent to "\n)\\\na\"\n".
The raw string
R"(x = ""y"")"
is equivalent to "x = \"\\\"y\\\"\"".
— end example
]
After translation phase 6, a string-literalthat does not begin with an encoding-prefix is anordinary string literal.
An ordinary string literal has type “array of n const char” where n is the size of the string as defined below, has static storage duration ([basic.stc]), and is initialized with the given characters.
A string-literal that begins with u8,such as u8"asdf", is a UTF-8 string literal.
A UTF-8 string literal has type “array of n const char8_t”, where n is the size of the string as defined below; each successive element of the object representation ([basic.types]) has the value of the corresponding code unit of the UTF-8 encoding of the string.
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.
A string-literal that begins with u,such as u"asdf", is a UTF-16 string literal.
A UTF-16 string literal has type “array of n const char16_t”, where n is the size of the string as defined below; each successive element of the array has the value of the corresponding code unit of the UTF-16 encoding of the string.
[ Note
:
A single c-char may produce more than one char16_t character in the form of surrogate pairs.
A surrogate pair is a representation for a single code point as a sequence of two 16-bit code units.
— end note
]
A string-literal that begins with U,such as U"asdf", is a UTF-32 string literal.
A UTF-32 string literal has type “array of n const char32_t”, where n is the size of the string as defined below; each successive element of the array has the value of the corresponding code unit of the UTF-32 encoding of the string.
A string-literal that begins with L,such as L"asdf", is a wide string literal.
A wide string literal has type “array of n const wchar_t”, where n is the size of the string as defined below; it is initialized with the given characters.
In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated.
If both string-literals have the same encoding-prefix, the resulting concatenated string-literal has that encoding-prefix.
If one string-literal has no encoding-prefix, it is treated as a string-literal of the same encoding-prefix as the other operand.
If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed.
Any other concatenations are conditionally-supported with implementation-defined behavior.
[ Note
:
This concatenation is an interpretation, not a conversion.
Because the interpretation happens in translation phase 6 (after each character from astring-literal has been translated into a value from the appropriate character set), astring-literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation.
— end note
]
Table 11 has some examples of valid concatenations.
Table 11: String literal concatenations [tab:lex.string.concat]
| Source | Means | Source | Means | Source | Means | |||
|---|---|---|---|---|---|---|---|---|
| u"a" | u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" |
| u"a" | "b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" |
| "a" | u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" |
Characters in concatenated strings are kept distinct.
[ Example
:
"\xA" "B"
contains the two characters '\xA' and 'B'after concatenation (and not the single hexadecimal character'\xAB').
— end example
]
After any necessary concatenation, in translation phase 7 ([lex.phases]), '\0' is appended to everystring-literal so that programs that scan a string can find its end.
Escape sequences and universal-character-names in non-raw string literals have the same meaning as in character-literals ([lex.ccon]), except that the single quote ' is representable either by itself or by the escape sequence\', and the double quote " shall be preceded by a\, and except that a universal-character-name in a UTF-16 string literal may yield a surrogate pair.
In a narrow string literal, a universal-character-name may map to more than one char or char8_t element due to multibyte encoding.
The size of a char32_t or wide string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for the terminating U'\0' orL'\0'.
The size of a UTF-16 string literal is the total number of escape sequences,universal-character-names, and other characters, plus one for each character requiring a surrogate pair, plus one for the terminatingu'\0'.
[ Note
:
The size of a char16_tstring literal is the number of code units, not the number of characters.
— end note
]
The size of a narrow string literal is the total number of escape sequences and other characters, plus at least one for the multibyte encoding of each universal-character-name, plus one for the terminating '\0'.
Evaluating a string-literal results in a string literal object with static storage duration, initialized from the given characters as specified above.
Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of astring-literal yield the same or a different object is unspecified.
[ Note
:
The effect of attempting to modify a string-literal is undefined.
— end note
]
5.13.6 Boolean literals [lex.bool]
boolean-literal: false true
The Boolean literals are the keywords false and true.
Such literals are prvalues and have type bool.
5.13.7 Pointer literals [lex.nullptr]
pointer-literal: nullptr
The pointer literal is the keyword nullptr.
It is a prvalue of typestd::nullptr_t.
[ Note
:
std::nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value.
— end note
]
5.13.8 User-defined literals [lex.ext]
user-defined-literal: user-defined-integer-literal user-defined-floating-point-literal user-defined-string-literal user-defined-character-literal
user-defined-integer-literal: decimal-literal ud-suffix octal-literal ud-suffix hexadecimal-literal ud-suffix binary-literal ud-suffix
user-defined-floating-point-literal: fractional-constant exponent-part ud-suffix digit-sequence exponent-part ud-suffix hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
user-defined-string-literal: string-literal ud-suffix
user-defined-character-literal: character-literal ud-suffix
If a token matches both user-defined-literal and another literal kind, it is treated as the latter.
[ Example
:
123_kmis a user-defined-literal, but 12LL is aninteger-literal.
— end example
]
The syntactic non-terminal preceding the ud-suffix in auser-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.
To determine the form of this call for a given user-defined-literal L with ud-suffix X, the literal-operator-id whose literal suffix identifier is X is looked up in the context of L using the rules for unqualified name lookup.
Let S be the set of declarations found by this lookup.
S shall not be empty.
If L is a user-defined-integer-literal, let n be the literal without its ud-suffix.
If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the form
operator "" X(nULL)
Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.
If S contains a raw literal operator, the literal L is treated as a call of the form
operator "" X("n")
Otherwise (S contains a numeric literal operator template),L is treated as a call of the form
operator "" X<'', '', ... ''>()
where n is the source character sequence .
[ Note
:
The sequence can only contain characters from the basic source character set.
— end note
]
If L is a user-defined-floating-point-literal, let f be the literal without its ud-suffix.
If S contains a literal operator with parameter type long double, the literal L is treated as a call of the form
operator "" X(fL)
Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.
If S contains a raw literal operator, the literal L is treated as a call of the form
operator "" X("f")
Otherwise (S contains a numeric literal operator template),L is treated as a call of the form
operator "" X<'', '', ... ''>()
where f is the source character sequence .
[ Note
:
The sequence can only contain characters from the basic source character set.
— end note
]
If L is a user-defined-string-literal, let str be the literal without its ud-suffixand let len be the number of code units in str(i.e., its length excluding the terminating null character).
If S contains a literal operator template with a non-type template parameter for which str is a well-formed template-argument, the literal L is treated as a call of the form
operator "" X()
Otherwise, the literal L is treated as a call of the form
operator "" X(str, len)
If L is a user-defined-character-literal, let ch be the literal without its ud-suffix.
S shall contain a literal operator whose only parameter has the type of ch and the literal L is treated as a call of the form
operator "" X(ch)
[ Example
:
long double operator "" _w(long double);
std::string operator "" _w(const char16_t*, std::size_t);
unsigned operator "" _w(const char*);
int main() {
1.2_w;
u"one"_w;
12_w;
"two"_w;
}
— end example
]
In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated anduser-defined-string-literals are considered string-literals for that purpose.
During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string].
At the end of phase 6, if a string-literal is the result of a concatenation involving at least oneuser-defined-string-literal, all the participatinguser-defined-string-literals shall have the same ud-suffixand that suffix is applied to the result of the concatenation.
[ Example
:
int main() {
L"A" "B" "C"_x;
"P"_x "Q" "R"_y;
}
— end example
]