[lex] (original) (raw)

5 Lexical conventions [lex]

5.1 Separate translation [lex.separate]

The text of the program is kept in units calledsource files in this document.

A source file together with all the and source files included via the preprocessing directive #include, less any source lines skipped by any of the conditional inclusion ([cpp.cond]) preprocessing directives, as modified by the implementation-defined behavior of any conditionally-supported-directives ([cpp.pre]) and pragmas ([cpp.pragma]), if any, is called a preprocessing translation unit.

[Note 1:

A C++ program need not all be translated at the same time.

— _end note_]

[Note 2:

Previously translated translation units and instantiation units can be preserved individually or in libraries.

The separate translation units of a program communicate ([basic.link]) by (for example) calls to functions whose identifiers have external or module linkage, manipulation of objects whose identifiers have external or module linkage, or manipulation of data files.

Translation units can be separately translated and then later linked to produce an executable program.

— _end note_]

5.2 Phases of translation [lex.phases]

The precedence among the syntax rules of translation is specified by the following phases.7

1. An implementation shall support input files that are a sequence of UTF-8 code units (UTF-8 files). It may also support an implementation-defined set of other kinds of input files, and, if so, the kind of an input file is determined in an implementation-defined manner that includes a means of designating input files as UTF-8 files, independent of their content. [Note 1: In other words, recognizing the U+feff byte order mark is not sufficient. — _end note_] If an input file is determined to be a UTF-8 file, then it shall be a well-formed UTF-8 code unit sequence and it is decoded to produce a sequence of Unicode8scalar values. A sequence of translation character set elements ([lex.charset]) is then formed by mapping each Unicode scalar value to the corresponding translation character set element. In the resulting sequence, each pair of characters in the input sequence consisting ofU+000d carriage return followed by U+000a line feed, as well as eachU+000d carriage return not immediately followed by a U+000a line feed, is replaced by a single new-line character.For any other kind of input file supported by the implementation, characters are mapped, in animplementation-defined manner, to a sequence of translation character set elements, representing end-of-line indicators as new-line characters.
2. If the first translation character is U+feff byte order mark, it is deleted. Each sequence of a backslash character (\) immediately followed by zero or more whitespace characters other than new-line followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. [Note 2: Line splicing can form a universal-character-name ([lex.charset]). — _end note_] A source file that is not empty and that (after splicing) does not end in a new-line character shall be processed as if an additional new-line character were appended to the file.
3. The source file is decomposed into preprocessing tokens ([lex.pptoken]) and sequences of whitespace characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment.[9](#footnote-9 "A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that requires a terminating sequence of characters, such as a header-name that is missing the closing " or >. A partial comment would arise from a source file ending with an unclosed /* comment.") Each comment ([lex.comment]) is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of whitespace characters other than new-line is retained or replaced by one space character is unspecified. As characters from the source file are consumed to form the next preprocessing token (i.e., not being consumed as part of a comment or other forms of whitespace), except when matching ac-char-sequence,s-char-sequence,r-char-sequence,h-char-sequence, orq-char-sequence,universal-character-names are recognized ([lex.universal.char]) and replaced by the designated element of the translation character set ([lex.charset]). The process of dividing a source file's characters into preprocessing tokens is context-dependent. [Example 1: See the handling of < within a #include preprocessing directive ([cpp.include]). — _end example_]
4. The source file is analyzed as a preprocessing-file ([cpp.pre]). Preprocessing directives ([cpp]) are executed, macro invocations are expanded ([cpp.replace]), and _Pragma unary operator expressions are executed ([cpp.pragma.op]). A #include preprocessing directive ([cpp.include]) causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
5. For a sequence of two or more adjacent string-literal preprocessing tokens, a common encoding-prefix is determined as specified in [lex.string]. Each such string-literal preprocessing token is then considered to have that common encoding-prefix.
6. Adjacent string-literal preprocessing tokens are concatenated ([lex.string]).
7. Each preprocessing token is converted into a token ([lex.token]). Whitespace characters separating tokens are no longer significant. The resulting tokens constitute a translation unit and are syntactically and semantically analyzed as a translation-unit ([basic.link]) and translated. [Note 3: The process of analyzing and translating the tokens can occasionally result in one token being replaced by a sequence of other tokens ([temp.names]).end note_] It isimplementation-defined whether the sources for module units and header units on which the current translation unit has an interface dependency ([module.unit], [module.import]) are required to be available. [_Note 4: Source files, translation units and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation. — _end note_]
8. Translated translation units and instantiation units are combined as follows:[Note 5: Some or all of these can be supplied from a library.end note_]Each translated translation unit is examined to produce a list of required instantiations. [_Note 6: This can include instantiations which have been explicitly requested ([temp.explicit]).end note_] The definitions of the required templates are located. It is implementation-defined whether the source of the translation units containing these definitions is required to be available. [_Note 7: An implementation can choose to encode sufficient information into the translated translation unit so as to ensure the source is not required here.end note_] All the required instantiations are performed to produceinstantiation units. [_Note 8: These are similar to translated translation units, but contain no references to uninstantiated templates and no template definitions. — _end note_] The program is ill-formed if any instantiation fails.
9. All external entity references are resolved. Library components are linked to satisfy external references to entities not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.

5.3 Characters [lex.char]

5.3.1 Character sets [lex.charset]

The translation character set consists of the following elements:

[Note 1:

Unicode code points are integers in the range [0, 10FFFF] (hexadecimal).

A surrogate code point is a value in the range [D800, DFFF] (hexadecimal).

A Unicode scalar value is any code point that is not a surrogate code point.

— _end note_]

The basic character set is a subset of the translation character set, consisting of 99 characters as specified in Table 1.

[Note 2:

Unicode short names are given only as a means to identifying the character; the numerical value has no other meaning in this context.

— _end note_]

Table 1 — Basic character set [tab:lex.charset.basic]

🔗character glyph
🔗U+0009 character tabulation
🔗U+000b line tabulation
🔗U+000c form feed
🔗U+0020 space
🔗U+000a line feed new-line
🔗U+0021 exclamation mark !
🔗U+0022 quotation mark "
🔗U+0023 number sign #
🔗U+0024 dollar sign $
🔗U+0025 percent sign %
🔗U+0026 ampersand &
🔗U+0027 apostrophe '
🔗U+0028 left parenthesis (
🔗U+0029 right parenthesis )
🔗U+002a asterisk *
🔗U+002b plus sign +
🔗U+002c comma ,
🔗U+002d hyphen-minus -
🔗U+002e full stop .
🔗U+002f solidus /
🔗U+0030 .. U+0039 digit zero .. nine 0 1 2 3 4 5 6 7 8 9
🔗U+003a colon :
🔗U+003b semicolon ;
🔗U+003c less-than sign <
🔗U+003d equals sign =
🔗U+003e greater-than sign >
🔗U+003f question mark ?
🔗U+0040 commercial at @
🔗U+0041 .. U+005a latin capital letter a .. z A B C D E F G H I J K L M
🔗 N O P Q R S T U V W X Y Z
🔗U+005b left square bracket [
🔗U+005c reverse solidus \
🔗U+005d right square bracket ]
🔗U+005e circumflex accent ^
🔗U+005f low line _
🔗U+0060 grave accent `
🔗U+0061 .. U+007a latin small letter a .. z a b c d e f g h i j k l m
🔗 n o p q r s t u v w x y z
🔗U+007b left curly bracket {
🔗U+007c vertical line |
🔗U+007d right curly bracket }
🔗U+007e tilde ~

The basic literal character set consists of all characters of the basic character set, plus the control characters specified in Table 2.

Table 2 — Additional control characters in the basic literal character set [tab:lex.charset.literal]

U+0000 null
U+0007 alert
U+0008 backspace
U+000d carriage return

Characters in a character-literalother than a multicharacter or non-encodable character literal or in a string-literal are encoded as a sequence of one or more code units, as determined by the encoding-prefix ([lex.ccon], [lex.string]); this is termed the respective literal encoding.

The ordinary literal encoding is the encoding applied to an ordinary character or string literal.

The wide literal encoding is the encoding applied to a wide character or string literal.

A literal encoding or a locale-specific encoding of one of the execution character sets ([character.seq]) encodes each element of the basic literal character set as a single code unit with non-negative value, distinct from the code unit for any other such element.

[Note 3:

A character not in the basic literal character set can be encoded with more than one code unit; the value of such a code unit can be the same as that of a code unit for an element of the basic literal character set.

— _end note_]

The U+0000 null character is encoded as the value 0.

No other element of the translation character set is encoded with a code unit of value 0.

The code unit value of each decimal digit character after the digit 0 (U+0030) shall be one greater than the value of the previous.

The ordinary and wide literal encodings are otherwiseimplementation-defined.

For a UTF-8, UTF-16, or UTF-32 literal, the implementation shall encode the Unicode scalar value corresponding to each character of the translation character set as specified in the Unicode Standard for the respective Unicode encoding form.

5.3.2 Universal character names [lex.universal.char]

n-char:
any member of the translation character set except the U+007d right curly bracket or new-line character

The universal-character-name construct provides a way to name any element in the translation character set using just the basic character set.

If a universal-character-name outside the c-char-sequence, s-char-sequence, orr-char-sequence of a character-literal orstring-literal(in either case, including within a user-defined-literal) corresponds to a control character or to a character in the basic character set, the program is ill-formed.

[Note 1:

A sequence of characters resembling a universal-character-name in anr-char-sequence ([lex.string]) does not form auniversal-character-name.

— _end note_]

A universal-character-nameof the form \u hex-quad,\U hex-quad hex-quad, or\u{simple-hexadecimal-digit-sequence}designates the character in the translation character set whose Unicode scalar value is the hexadecimal number represented by the sequence of hexadecimal-digit_s_in the universal-character-name.

The program is ill-formed if that number is not a Unicode scalar value.

A universal-character-namethat is a named-universal-characterdesignates the corresponding character in the Unicode Standard (chapter 4.8 Name) if the n-char-sequence is equal to its character name or to one of its character name aliases of type “control”, “correction”, or “alternate”; otherwise, the program is ill-formed.

[Note 2:

These aliases are listed in the Unicode Character Database's NameAliases.txt.

None of these names or aliases have leading or trailing spaces.

— _end note_]

5.5 Preprocessing tokens [lex.pptoken]

preprocessing-token:

import-keyword
module-keyword
export-keyword
identifier
pp-number
character-literal
user-defined-character-literal
string-literal
user-defined-string-literal
preprocessing-op-or-punc
each non-whitespace character that cannot be one of the above

A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6.

In this document, glyphs are used to identify elements of the basic character set ([lex.charset]).

The categories of preprocessing token are: header names, placeholder tokens produced by preprocessing import and module directives (import-keyword, module-keyword, and export-keyword), identifiers, preprocessing numbers, character literals (including user-defined character literals), string literals (including user-defined string literals), preprocessing operators and punctuators, and single non-whitespace characters that do not lexically match the other preprocessing token categories.

If a U+0027 apostrophe or a U+0022 quotation mark character matches the last category, the program is ill-formed.

If any character not in the basic character set matches the last category, the program is ill-formed.

Preprocessing tokens can be separated bywhitespace;this consists of comments ([lex.comment]), or whitespace characters (U+0020 space,U+0009 character tabulation, new-line,U+000b line tabulation, andU+000c form feed), or both.

As described in [cpp], in certain circumstances during translation phase 4, whitespace (or the absence thereof) serves as more than preprocessing token separation.

Whitespace can appear within a preprocessing token only as part of a header name or between the quotation characters in a character literal or string literal.

Each preprocessing token that is converted to a token ([lex.token]) shall have the lexical form of a keyword, an identifier, a literal, or an operator or punctuator.

The import-keyword is produced by processing an import directive ([cpp.import]), the module-keyword is produced by preprocessing a module directive ([cpp.module]), and the export-keyword is produced by preprocessing either of the previous two directives.

[Note 1:

None has any observable spelling.

— _end note_]

If the input stream has been parsed into preprocessing tokens up to a given character:

[Example 1: #define R "x" const char* s = R"y"; — _end example_]

[Example 2:

The program fragment 0xe+foo is parsed as a preprocessing number token (one that is not a validinteger-literal or floating-point-literal token), even though a parse as three preprocessing tokens0xe, +, and foo can produce a valid expression (for example, if foo is a macro defined as 1).

Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid floating-point-literal token), whether or not E is a macro name.

— _end example_]

[Example 3:

The program fragment x+++++y is parsed as x++ ++ + y, which, if x and y have integral types, violates a constraint on increment operators, even though the parsex ++ + ++ y can yield a correct expression.

— _end example_]

5.7 Preprocessing numbers [lex.ppnumber]

Preprocessing number tokens lexically include all integer-literal tokens ([lex.icon]) and all floating-point-literal tokens ([lex.fcon]).

A preprocessing number does not have a type or a value; it acquires both after a successful conversion to an integer-literal token or a floating-point-literal token.

5.8 Operators and punctuators [lex.operators]

The lexical representation of C++ programs includes a number of preprocessing tokens that are used in the syntax of the preprocessor or are converted into tokens for operators and punctuators:

preprocessing-operator: one of

## %: %:%:

operator-or-punctuator: one of
{ } [ ] ( )
<% %> <: :> ; : ...
? :: . .* -> ->* ~
! + - * / % ^ & |
= += -= *= /= %= ^= &= |=
== != < > <= >= <=> && ||
<< >> <<= >>= ++ -- ,
and or xor not bitand bitor compl
and_eq or_eq xor_eq not_eq

Each operator-or-punctuator is converted to a single token in translation phase 7 ([lex.phases]).

5.9 Alternative tokens [lex.digraph]

Alternative token representations are provided for some operators and punctuators.10

In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling.11

The set of alternative tokens is defined in Table 3.

Table 3 — Alternative tokens [tab:lex.digraph]

🔗Alternative Primary Alternative Primary Alternative Primary
🔗<% { and && and_eq &=
🔗%> } bitor | or_eq |=
🔗<: [ or | xor_eq
🔗:> ] xor ^ not !
🔗%: # compl ~ not_eq !=
🔗%:%: ## bitand &

5.10 Tokens [lex.token]

There are five kinds of tokens: identifiers, keywords, literals,12operators, and other separators.

Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, “whitespace”), as described below, are ignored except as they serve to separate tokens.

[Note 1:

Whitespace can separate otherwise adjacent identifiers, keywords, numeric literals, and alternative tokens containing alphabetic characters.

— _end note_]

5.11 Identifiers [lex.name]

identifier-start:
nondigit
an element of the translation character set with the Unicode property XID_Start

identifier-continue:
digit
nondigit
an element of the translation character set with the Unicode property XID_Continue

nondigit: one of
a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z _

digit: one of
0 1 2 3 4 5 6 7 8 9

[Note 1:

The character properties XID_Start and XID_Continue are described by UAX #44 of the Unicode Standard.13

— _end note_]

The program is ill-formed if an identifier does not conform to Normalization Form C as specified in the Unicode Standard.

[Note 2:

Identifiers are case-sensitive.

— _end note_]

[Note 3:

[uaxid] compares the requirements of UAX #31 of the Unicode Standard with the C++ rules for identifiers.

— _end note_]

[Note 4:

In translation phase 4,identifier also includes those preprocessing-tokens ([lex.pptoken]) differentiated as keywords ([lex.key]) in the later translation phase 7 ([lex.token]).

— _end note_]

The identifiers in Table 4 have a special meaning when appearing in a certain context.

When referred to in the grammar, these identifiers are used explicitly rather than using the identifier grammar production.

Unless otherwise specified, any ambiguity as to whether a givenidentifier has a special meaning is resolved to interpret the token as a regular identifier.

In addition, some identifiers appearing as a token or preprocessing-tokenare reserved for use by C++ implementations and shall not be used otherwise; no diagnostic is required.

5.12 Keywords [lex.key]

keyword:
any identifier listed in Table 5
import-keyword
module-keyword
export-keyword

The identifiers shown in Table 5 are reserved for use as keywords (that is, they are unconditionally treated as keywords in phase 7) except in an attribute-token ([dcl.attr.grammar]).

[Note 1:

The register keyword is unused but is reserved for future use.

— _end note_]

Table 5 — Keywords [tab:lex.key]

🔗alignas constinit extern protected throw
🔗alignof const_cast false public true
🔗asm continue float register try
🔗auto contract_assert for reinterpret_cast typedef
🔗bool co_await friend requires typeid
🔗break co_return goto return typename
🔗case co_yield if short union
🔗catch decltype inline signed unsigned
🔗char default int sizeof using
🔗char8_t delete long static virtual
🔗char16_t do mutable static_assert void
🔗char32_t double namespace static_cast volatile
🔗class dynamic_cast new struct wchar_t
🔗concept else noexcept switch while
🔗const enum nullptr template
🔗consteval explicit operator this
🔗constexpr export private thread_local

Furthermore, the alternative representations shown in Table 6 for certain operators and punctuators ([lex.digraph]) are reserved and shall not be used otherwise.

Table 6 — Alternative representations [tab:lex.key.digraph]

🔗and and_eq bitand bitor compl not
🔗not_eq or or_eq xor xor_eq

5.13 Literals [lex.literal]

5.13.2 Integer literals [lex.icon]

octal-digit: one of
0 1 2 3 4 5 6 7

nonzero-digit: one of
1 2 3 4 5 6 7 8 9

hexadecimal-prefix: one of
0x 0X

hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F

unsigned-suffix: one of
u U

long-long-suffix: one of
ll LL

In an integer-literal, the sequence ofbinary-digits,octal-digits,digits, orhexadecimal-digit_s_is interpreted as a base N integer as shown in Table 7; the lexically first digit of the sequence of digits is the most significant.

[Note 1:

The prefix and any optional separating single quotes are ignored when determining the value.

— _end note_]

The hexadecimal-digits a through f and A through Fhave decimal values ten through fifteen.

[Example 1:

The number twelve can be written 12, 014,0XC, or 0b1100.

The integer-literals 1048576,1'048'576, 0X100000, 0x10'0000, and0'004'000'000 all have the same value.

— _end example_]

The type of an integer-literal is the first type in the list in Table 8corresponding to its optional integer-suffixin which its value can be represented.

Table 8 — Types of integer-literals [tab:lex.icon.type]

🔗integer-suffix decimal-literal integer-literal other than decimal-literal
🔗none int int
🔗 long int unsigned int
🔗 long long int long int
🔗 unsigned long int
🔗 long long int
🔗 unsigned long long int
🔗u or U unsigned int unsigned int
🔗 unsigned long int unsigned long int
🔗 unsigned long long int unsigned long long int
🔗l or L long int long int
🔗 long long int unsigned long int
🔗 long long int
🔗 unsigned long long int
🔗Both u or U unsigned long int unsigned long int
🔗and l or L unsigned long long int unsigned long long int
🔗ll or LL long long int long long int
🔗 unsigned long long int
🔗Both u or U unsigned long long int unsigned long long int
🔗and ll or LL
🔗z or Z the signed integer type corresponding the signed integer type
🔗 to std​::​size_t ([support.types.layout]) corresponding to std​::​size_t
🔗 std​::​size_t
🔗Both u or U std​::​size_t std​::​size_t
🔗and z or Z

Except for integer-literals containing a size-suffix, if the value of an integer-literalcannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type.

If all of the types in the list for the integer-literalare signed, the extended integer type is signed.

If all of the types in the list for the integer-literalare unsigned, the extended integer type is unsigned.

If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.

If an integer-literalcannot be represented by any of the allowed types, the program is ill-formed.

[Note 2:

An integer-literal with a z or Z suffix is ill-formed if it cannot be represented by std​::​size_t.

— _end note_]

5.13.3 Character literals [lex.ccon]

encoding-prefix: one of
u8 u U L

basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character

simple-escape-sequence-char: one of
' " ? \ a b f n r t v

conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x

A multicharacter literal is a character-literalwhose c-char-sequence consists of more than one c-char.

A multicharacter literal shall not have an encoding-prefix.

If a multicharacter literal contains a c-charthat is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.

Multicharacter literals are conditionally-supported.

The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequenceas defined by Table 9.

Table 9 — Character literals [tab:lex.ccon.literal]

🔗Encoding Kind Type Associated char- Example
🔗prefix acter encoding
🔗none ordinary character literal char ordinary literal 'v'
🔗 multicharacter literal int encoding 'abcd'
🔗L wide character literal wchar_t wide literal L'w'
🔗 encoding
🔗u8 UTF-8 character literal char8_t UTF-8 u8'x'
🔗u UTF-16 character literal char16_t UTF-16 u'y'
🔗U UTF-32 character literal char32_t UTF-32 U'z'

In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.

A multicharacter literal has animplementation-defined value.

The value of any other kind of character-literalis determined as follows:

The character specified by a simple-escape-sequenceis specified in Table 10.

[Note 1:

Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.

— _end note_]

Table 10 — Simple escape sequences [tab:lex.ccon.esc]

🔗character simple-escape-sequence
🔗U+000a line feed \n
🔗U+0009 character tabulation \t
🔗U+000b line tabulation \v
🔗U+0008 backspace \b
🔗U+000d carriage return \r
🔗U+000c form feed \f
🔗U+0007 alert \a
🔗U+005c reverse solidus \\
🔗U+003f question mark \?
🔗U+0027 apostrophe \'
🔗U+0022 quotation mark \"

5.13.4 Floating-point literals [lex.fcon]

floating-point-suffix: one of
f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16

[Note 1:

The floating-point suffixesf16, f32, f64, f128, bf16,F16, F32, F64, F128, and BF16are conditionally-supported.

— _end note_]

Table 11 — Types of floating-point-literals [tab:lex.fcon.type]

🔗floating-point-suffix type
🔗none double
🔗f or F float
🔗l or L long double
🔗f16 or F16 std​::​float16_t
🔗f32 or F32 std​::​float32_t
🔗f64 or F64 std​::​float64_t
🔗f128 or F128 std​::​float128_t
🔗bf16 or BF16 std​::​bfloat16_t

The significand of a floating-point-literalis the fractional-constant or digit-sequenceof a decimal-floating-point-literalor the hexadecimal-fractional-constantor hexadecimal-digit-sequenceof a hexadecimal-floating-point-literal.

In the significand, the sequence of digits or hexadecimal-digit_s_and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal.

[Note 2:

Any optional separating single quotes are ignored when determining the value.

— _end note_]

If an exponent-part or binary-exponent-partis present, the exponent e of the floating-point-literalis the result of interpreting the sequence of an optional sign and the digit_s_as a base 10 integer.

Otherwise, the exponent e is 0.

The scaled value of the literal is for a decimal-floating-point-literal and for a hexadecimal-floating-point-literal.

[Example 1:

The floating-point-literals 49.625 and 0xC.68p+2 have the same value.

The floating-point-literals 1.602'176'565e-19 and 1.602176565e-19have the same value.

— _end example_]

If the scaled value is not in the range of representable values for its type, the program is ill-formed.

Otherwise, the value of a floating-point-literalis the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

5.13.5 String literals [lex.string]

basic-s-char:
any member of the translation character set except the U+0022 quotation mark,
U+005c reverse solidus, or new-line character

r-char:
any member of the translation character set, except a U+0029 right parenthesis followed by
the initial d-char-sequence (which may be empty) followed by a U+0022 quotation mark

d-char:
any member of the basic character set except:
U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line

The kind of a string-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding prefix and sequence ofs-chars or r-char_s_as defined by Table 12where n is the number of encoded code units that would result from an evaluation of the string-literal(see below).

Table 12 — String literals [tab:lex.string.literal]

🔗Enco- Kind Type Associated Examples
🔗ding character
🔗prefix encoding
🔗none ordinary string literal array of nconst char ordinary literal encoding "ordinary string"R"(ordinary raw string)"
🔗L wide string literal array of nconst wchar_t wide literalencoding L"wide string"LR"w(wide raw string)w"
🔗u8 UTF-8 string literal array of nconst char8_t UTF-8 u8"UTF-8 string"u8R"x(UTF-8 raw string)x"
🔗u UTF-16 string literal array of nconst char16_t UTF-16 u"UTF-16 string"uR"y(UTF-16 raw string)y"
🔗U UTF-32 string literal array of nconst char32_t UTF-32 U"UTF-32 string"UR"z(UTF-32 raw string)z"

A string-literal that has an R in the prefix is a raw string literal.

Thed-char-sequence serves as a delimiter.

The terminatingd-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.

A d-char-sequenceshall consist of at most 16 characters.

[Note 1:

The characters '(' and ')' can appear in araw-string.

Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)".

— _end note_]

[Note 2:

A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.

Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);

— _end note_]

[Example 1:

The raw stringR"a( )\ a" )a" is equivalent to "\n)\\\na\"\n".

The raw stringR"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\"".

— _end example_]

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.

The string-literals in any sequence of adjacent string-literal_s_shall have at most one unique encoding-prefix among them.

The common encoding-prefix of the sequence is that encoding-prefix, if any.

[Note 3:

A string-literal's rawness has no effect on the determination of the common encoding-prefix.

— _end note_]

In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated.

The lexical structure and grouping of the contents of the individual string-literals is retained.

[Example 2:

"\xA" "B" represents the code unit '\xA' and the character 'B'after concatenation (and not the single code unit '\xAB').

Similarly,R"(\u00)" "41" represents six characters, starting with a backslash and ending with the digit 1(and not the single character 'A'specified by a universal-character-name).

Table 13 has some examples of valid concatenations.

— _end example_]

Table 13 — String literal concatenations [tab:lex.string.concat]

🔗Source Means Source Means Source Means
🔗u"a" u"b" u"ab" U"a" U"b" U"ab" L"a" L"b" L"ab"
🔗u"a" "b" u"ab" U"a" "b" U"ab" L"a" "b" L"ab"
🔗"a" u"b" u"ab" "a" U"b" U"ab" "a" L"b" L"ab"

Evaluating a string-literal results in a string literal object with static storage duration ([basic.stc]).

[Note 4:

String literal objects are potentially non-unique ([intro.object]).

Whether successive evaluations of astring-literal yield the same or a different object is unspecified.

— _end note_]

[Note 5:

The effect of attempting to modify a string literal object is undefined.

— _end note_]

String literal objects are initialized with the sequence of code unit values corresponding to the string-literal's sequence ofs-chars (originally from non-raw string literals) andr-chars (originally from raw string literals), plus a terminating U+0000 null character, in order as follows:

5.13.6 Unevaluated strings [lex.string.uneval]

An unevaluated-string shall have no encoding-prefix.

Each universal-character-name and each simple-escape-sequence in an unevaluated-string is replaced by the member of the translation character set it denotes.

An unevaluated-string that contains a numeric-escape-sequence or a conditional-escape-sequenceis ill-formed.

An unevaluated-string is never evaluated and its interpretation depends on the context in which it appears.

5.13.7 Boolean literals [lex.bool]

boolean-literal:
false
true

The Boolean literals are the keywords false and true.

Such literals have type bool.

5.13.8 Pointer literals [lex.nullptr]

The pointer literal is the keyword nullptr.

It has typestd​::​nullptr_t.

[Note 1:

std​::​nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value.

— _end note_]

5.13.9 User-defined literals [lex.ext]

If a token matches both user-defined-literal and another literal kind, it is treated as the latter.

[Example 1:

123_kmis a user-defined-literal, but 12LL is aninteger-literal.

— _end example_]

The syntactic non-terminal preceding the ud-suffix in auser-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.

A user-defined-literal is treated as a call to a literal operator or literal operator template ([over.literal]).

To determine the form of this call for a given user-defined-literal _L_with ud-suffix X, first let S be the set of declarations found by unqualified lookup for the literal-operator-idwhose literal suffix identifier is X ([basic.lookup.unqual]).

S shall not be empty.

If L is a user-defined-integer-literal, let n be the literal without its ud-suffix.

If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the formoperator ""X(_n_ULL)

Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.

If S contains a raw literal operator, the literal L is treated as a call of the formoperator ""X("n")

Otherwise (S contains a numeric literal operator template),L is treated as a call of the formoperator ""X<'', '', ... ''>() where n is the source character sequence .

[Note 1:

The sequence can only contain characters from the basic character set.

— _end note_]

If L is a user-defined-floating-point-literal, let f be the literal without its ud-suffix.

If S contains a literal operator with parameter type long double, the literal L is treated as a call of the formoperator ""X(_f_L)

Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.

If S contains a raw literal operator, the literal L is treated as a call of the formoperator ""X("f")

Otherwise (S contains a numeric literal operator template),L is treated as a call of the formoperator ""X<'', '', ... ''>() where f is the source character sequence .

[Note 2:

The sequence can only contain characters from the basic character set.

— _end note_]

If L is a user-defined-string-literal, let str be the literal without its ud-suffixand let len be the number of code units in str(i.e., its length excluding the terminating null character).

If S contains a literal operator template with a constant template parameter for which str is a well-formed template-argument, the literal L is treated as a call of the formoperator ""X<_str_>()

Otherwise, the literal L is treated as a call of the formoperator ""X(str, len)

If L is a user-defined-character-literal, let ch be the literal without its ud-suffix.

S shall contain a literal operator whose only parameter has the type of ch and the literal L is treated as a call of the formoperator ""X(ch)

[Example 2: long double operator ""_w(long double); std::string operator ""_w(const char16_t*, std::size_t);unsigned operator ""_w(const char*);int main() { 1.2_w; u"one"_w; 12_w; "two"_w; } — _end example_]

In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated anduser-defined-string-literals are considered string-literals for that purpose.

During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string].

At the end of phase 6, if a string-literal is the result of a concatenation involving at least oneuser-defined-string-literal, all the participatinguser-defined-string-literals shall have the same ud-suffixand that suffix is applied to the result of the concatenation.

[Example 3: int main() { L"A" "B" "C"_x; "P"_x "Q" "R"_y; } — _end example_]