[lex.ccon] (original) (raw)
5 Lexical conventions [lex]
5.13 Literals [lex.literal]
5.13.3 Character literals [lex.ccon]
encoding-prefix: one of
u8 u U L
basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character
simple-escape-sequence-char: one of
' " ? \ a b f n r t v
conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x
A multicharacter literal is a character-literalwhose c-char-sequence consists of more than one c-char.
A multicharacter literal shall not have an encoding-prefix.
If a multicharacter literal contains a c-charthat is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.
Multicharacter literals are conditionally-supported.
The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequenceas defined by Table 9.
Table 9 — Character literals [tab:lex.ccon.literal]
🔗Encoding | Kind | Type | Associated char- | Example |
---|---|---|---|---|
🔗prefix | acter encoding | |||
🔗none | ordinary character literal | char | ordinary literal | 'v' |
🔗 | multicharacter literal | int | encoding | 'abcd' |
🔗L | wide character literal | wchar_t | wide literal | L'w' |
🔗 | encoding | |||
🔗u8 | UTF-8 character literal | char8_t | UTF-8 | u8'x' |
🔗u | UTF-16 character literal | char16_t | UTF-16 | u'y' |
🔗U | UTF-32 character literal | char32_t | UTF-32 | U'z' |
In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.
A multicharacter literal has animplementation-defined value.
The value of any other kind of character-literalis determined as follows:
- A character-literal with a c-char-sequence consisting of a singlebasic-c-char,simple-escape-sequence, oruniversal-character-nameis the code unit value of the specified character as encoded in the literal's associated character encoding.
If the specified character lacks representation in the literal's associated character encoding or if it cannot be encoded as a single code unit, then the program is ill-formed. - A character-literal with a c-char-sequence consisting of a single numeric-escape-sequencehas a value as follows:
- Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
- If v does not exceed the range of representable values of the character-literal's type, then the value is v.
- Otherwise, if the character-literal's encoding-prefixis absent or L, andv does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to v modulo , where N is the width of T.
- Otherwise, the program is ill-formed.
- A character-literal with a c-char-sequence consisting of a single conditional-escape-sequenceis conditionally-supported and has an implementation-defined value.
The character specified by a simple-escape-sequenceis specified in Table 10.
[Note 1:
Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.
— _end note_]
Table 10 — Simple escape sequences [tab:lex.ccon.esc]
🔗character | simple-escape-sequence | |
---|---|---|
🔗U+000a | line feed | \n |
🔗U+0009 | character tabulation | \t |
🔗U+000b | line tabulation | \v |
🔗U+0008 | backspace | \b |
🔗U+000d | carriage return | \r |
🔗U+000c | form feed | \f |
🔗U+0007 | alert | \a |
🔗U+005c | reverse solidus | \\ |
🔗U+003f | question mark | \? |
🔗U+0027 | apostrophe | \' |
🔗U+0022 | quotation mark | \" |