[lex.string] (original) (raw)

5 Lexical conventions [lex]

5.13 Literals [lex.literal]

5.13.5 String literals [lex.string]

basic-s-char:
any member of the translation character set except the U+0022 quotation mark,
U+005c reverse solidus, or new-line character

r-char:
any member of the translation character set, except a U+0029 right parenthesis followed by
the initial d-char-sequence (which may be empty) followed by a U+0022 quotation mark

d-char:
any member of the basic character set except:
U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line

The kind of a string-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding prefix and sequence ofs-chars or r-char_s_as defined by Table 12where n is the number of encoded code units that would result from an evaluation of the string-literal(see below).

Table 12 — String literals [tab:lex.string.literal]

🔗Enco- Kind Type Associated Examples
🔗ding character
🔗prefix encoding
🔗none ordinary string literal array of nconst char ordinary literal encoding "ordinary string"R"(ordinary raw string)"
🔗L wide string literal array of nconst wchar_t wide literalencoding L"wide string"LR"w(wide raw string)w"
🔗u8 UTF-8 string literal array of nconst char8_t UTF-8 u8"UTF-8 string"u8R"x(UTF-8 raw string)x"
🔗u UTF-16 string literal array of nconst char16_t UTF-16 u"UTF-16 string"uR"y(UTF-16 raw string)y"
🔗U UTF-32 string literal array of nconst char32_t UTF-32 U"UTF-32 string"UR"z(UTF-32 raw string)z"

A string-literal that has an R in the prefix is a raw string literal.

Thed-char-sequence serves as a delimiter.

The terminatingd-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.

A d-char-sequenceshall consist of at most 16 characters.

[Note 1:

The characters '(' and ')' can appear in araw-string.

Thus, R"delimiter((a|b))delimiter" is equivalent to"(a|b)".

— _end note_]

[Note 2:

A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.

Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:const char* p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);

— _end note_]

[Example 1:

The raw stringR"a( )\ a" )a" is equivalent to "\n)\\\na\"\n".

The raw stringR"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\"".

— _end example_]

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.

The string-literals in any sequence of adjacent string-literal_s_shall have at most one unique encoding-prefix among them.

[Note 3:

A string-literal's rawness has no effect on the determination of the common encoding-prefix.

— _end note_]

In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated.

The lexical structure and grouping of the contents of the individual string-literals is retained.

[Example 2:

"\xA" "B" represents the code unit '\xA' and the character 'B'after concatenation (and not the single code unit '\xAB').

Similarly,R"(\u00)" "41" represents six characters, starting with a backslash and ending with the digit 1(and not the single character 'A'specified by a universal-character-name).

Table 13 has some examples of valid concatenations.

— _end example_]

Table 13 — String literal concatenations [tab:lex.string.concat]

🔗Source Means Source Means Source Means
🔗u"a" u"b" u"ab" U"a" U"b" U"ab" L"a" L"b" L"ab"
🔗u"a" "b" u"ab" U"a" "b" U"ab" L"a" "b" L"ab"
🔗"a" u"b" u"ab" "a" U"b" U"ab" "a" L"b" L"ab"

Evaluating a string-literal results in a string literal object with static storage duration ([basic.stc]).

[Note 4:

String literal objects are potentially non-unique ([intro.object]).

Whether successive evaluations of astring-literal yield the same or a different object is unspecified.

— _end note_]

[Note 5:

The effect of attempting to modify a string literal object is undefined.

— _end note_]

String literal objects are initialized with the sequence of code unit values corresponding to the string-literal's sequence ofs-chars (originally from non-raw string literals) andr-chars (originally from raw string literals), plus a terminating U+0000 null character, in order as follows: