[lex.charset] (original) (raw)
5 Lexical conventions [lex]
5.3 Character sets [lex.charset]
The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:9
a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '
The universal-character-name construct provides a way to name other characters.
hex-quad: hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
universal-character-name: \u hex-quad \U hex-quad hex-quad
A universal-character-namedesignates the character in ISO/IEC 10646 (if any) whose code point is the hexadecimal number represented by the sequence of hexadecimal-digits in the universal-character-name.
The program is ill-formed if that number is not a code point or if it is a surrogate code point.
Noncharacter code points and reserved code points are considered to designate separate characters distinct from any ISO/IEC 10646 character.
[ Note
:
ISO/IEC 10646 code points are integers in the range (hexadecimal).
A surrogate code point is a value in the range (hexadecimal).
A control character is a character whose code point is in either of the ranges or (hexadecimal).
— end note
]
The basic execution character set and thebasic execution wide-character setshall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character(respectively, null wide character), whose value is 0.
For each basic execution character set, the values of the members shall be non-negative and distinct from one another.
In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
The execution character setand the execution wide-character set areimplementation-defined supersets of the basic execution character set and the basic execution wide-character set, respectively.
The values of the members of the execution character sets and the sets of additional members are locale-specific.
The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set.
However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified asimplementation-defined, an implementation is required to document how the basic source characters are represented in source files.