[lex.phases] (original) (raw)
5 Lexical conventions [lex]
5.2 Phases of translation [lex.phases]
The precedence among the syntax rules of translation is specified by the following phases.6
- 1.
Physical source file characters are mapped, in animplementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary.
The set of physical source file characters accepted is implementation-defined.
An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (e.g., using the \ uXXXX notation), are handled equivalently except where this replacement is reverted ([lex.pptoken]) in a raw string literal. - 2.
Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines.
Only the last backslash on any physical source line shall be eligible for being part of such a splice.
Except for splices reverted in a raw string literal, if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined.
A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file. - 3.
The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments).
A source file shall not end in a partial preprocessing token or in a partial comment.7
Each comment is replaced by one space character.
New-line characters are retained.
Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is unspecified.
The process of dividing a source file's characters into preprocessing tokens is context-dependent.
[ Example
:
See the handling of < within a #include preprocessing directive.
— end example
] - 4.
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed.
A#include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively.
All preprocessing directives are then deleted. - 5.
Each basic source character set member in a character-literal or astring-literal, as well as each escape sequence and universal-character-name in acharacter-literal or a non-raw string literal, is converted to the corresponding member of the execution character set ([lex.ccon], [lex.string]); if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.8 - 6.
Adjacent string literal tokens are concatenated. - 7.
White-space characters separating tokens are no longer significant.
Each preprocessing token is converted into a token ([lex.token]).
The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.
[ Note
:
The process of analyzing and translating the tokens may occasionally result in one token being replaced by a sequence of other tokens ([temp.names]).
— end note
]
It isimplementation-defined whether the sources for module units and header units on which the current translation unit has an interface dependency ([module.unit], [module.import]) are required to be available.
[ Note
:
Source files, translation units and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation.
The description is conceptual only, and does not specify any particular implementation.
— end note
] - 8.
Translated translation units and instantiation units are combined as follows:
[ Note
: Some or all of these may be supplied from a library. — end note
]
Each translated translation unit is examined to produce a list of required instantiations.
[ Note
:
This may include instantiations which have been explicitly requested ([temp.explicit]).
— end note
]
The definitions of the required templates are located.
It is implementation-defined whether the source of the translation units containing these definitions is required to be available.
[ Note
:
An implementation could encode sufficient information into the translated translation unit so as to ensure the source is not required here.
— end note
]
All the required instantiations are performed to produceinstantiation units.
[ Note
:
These are similar to translated translation units, but contain no references to uninstantiated templates and no template definitions.
— end note
]
The program is ill-formed if any instantiation fails. - 9.
All external entity references are resolved.
Library components are linked to satisfy external references to entities not defined in the current translation.
All such translator output is collected into a program image which contains information needed for execution in its execution environment.