[lex.ccon] (original) (raw)

5 Lexical conventions [lex]

5.13 Literals [lex.literal]

5.13.3 Character literals [lex.ccon]

basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character

simple-escape-sequence-char: one of
' " ? \ a b f n r t v

conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x

A multicharacter literal is a character-literalwhose c-char-sequence consists of more than one c-char.

A multicharacter literal shall not have an encoding-prefix.

If a multicharacter literal contains a c-charthat is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.

Multicharacter literals are conditionally-supported.

The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequenceas defined by Table 9 .

Table 9 — Character literals [tab:lex.ccon.literal]

🔗Encoding	Kind	Type	Associated char-	Example
🔗prefix	acter encoding
🔗none	ordinary character literal	char	ordinary literal	'v'
🔗	multicharacter literal	int	encoding	'abcd'
🔗L	wide character literal	wchar_t	wide literal	L'w'
🔗	encoding
🔗u8	UTF-8 character literal	char8_t	UTF-8	u8'x'
🔗u	UTF-16 character literal	char16_t	UTF-16	u'y'
🔗U	UTF-32 character literal	char32_t	UTF-32	U'z'

In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.

A multicharacter literal has animplementation-defined value.

The value of any other kind of character-literalis determined as follows:

A character-literal with a c-char-sequence consisting of a singlebasic-c-char,simple-escape-sequence, oruniversal-character-nameis the code unit value of the specified character as encoded in the literal's associated character encoding.
If the specified character lacks representation in the literal's associated character encoding or if it cannot be encoded as a single code unit, then the program is ill-formed.
A character-literal with a c-char-sequence consisting of a single numeric-escape-sequencehas a value as follows:
- Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
- If v does not exceed the range of representable values of the character-literal's type, then the value is v.
- Otherwise, if the character-literal's encoding-prefixis absent or L, andv does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to v modulo , where N is the width of T.
- Otherwise, the program is ill-formed.
A character-literal with a c-char-sequence consisting of a single conditional-escape-sequenceis conditionally-supported and has an implementation-defined value.

The character specified by a simple-escape-sequenceis specified in Table 10 .

[Note 1:

Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.

— _end note_]

Table 10 — Simple escape sequences [tab:lex.ccon.esc]

🔗character	*simple-escape-sequence*
🔗U+000a	line feed	\n
🔗U+0009	character tabulation	\t
🔗U+000b	line tabulation	\v
🔗U+0008	backspace	\b
🔗U+000d	carriage return	\r
🔗U+000c	form feed	\f
🔗U+0007	alert	\a
🔗U+005c	reverse solidus	\\
🔗U+003f	question mark	\?
🔗U+0027	apostrophe	\'
🔗U+0022	quotation mark	\"