[lex.charset] (original) (raw)

5 Lexical conventions [lex]

5.3 Character sets [lex.charset]

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:9

a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '

The universal-character-name construct provides a way to name other characters.

hex-quad: hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

universal-character-name: \u hex-quad \U hex-quad hex-quad

A universal-character-namedesignates the character in ISO/IEC 10646 (if any) whose code point is the hexadecimal number represented by the sequence of hexadecimal-digits in the universal-character-name.

The program is ill-formed if that number is not a code point or if it is a surrogate code point.

Noncharacter code points and reserved code points are considered to designate separate characters distinct from any ISO/IEC 10646 character.

[ Note

:

ISO/IEC 10646 code points are integers in the range (hexadecimal).

A surrogate code point is a value in the range (hexadecimal).

A control character is a character whose code point is in either of the ranges or (hexadecimal).

end note

]

The basic execution character set and thebasic execution wide-character setshall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character(respectively, null wide character), whose value is 0.

For each basic execution character set, the values of the members shall be non-negative and distinct from one another.

In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.

The execution character setand the execution wide-character set areimplementation-defined supersets of the basic execution character set and the basic execution wide-character set, respectively.

The values of the members of the execution character sets and the sets of additional members are locale-specific.

The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set.

However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified asimplementation-defined, an implementation is required to document how the basic source characters are represented in source files.