Prepreocessor: Embed Parameter Ordering (original) (raw)
The C and C++ Compatibility Study Group, when working on the new standard #embed
preprocessor parameter that mirrors the clang::offset(...)
and gnu::offset(...)
parameters, had someone raise a concern that the order of may be confusing. The concerns came from the June 4th, 2025 meeting (anchored link).
Background
Throughout the rest of this text, clang::offset
, gnu::offset
, and the almost-standard offset
parameter will be used interchangeably in prose. They represent the same preprocessor embed parameter, with the same semantics.
Similarly, a resource named <data.bin>
is a resource with exactly 10 bytes and is considered as such when put in an #embed
statement.
While the following 2 invocation of #embed
are identical and produce exactly the same data:
#embed <data.bin> clang::offset(1) limit(3) /* ONE */
#embed <data.bin> limit(3) clang::offset(1) /* TWO */
some people questioned whether or not the difference in order might make some people confused that they do not produce identical effects (e.g., that offset
is always calculated first based on the raw file size, and then limit
is applied after, or vice-versa).
The Core Proposal
Following from the background, some people advocated for providing a warning/error for if it was written in the “wrong order”. That is, since limit
always applies after offset
, the standard wanted to mandate that such parameters must always be written in a specific order. That is, /* ONE */
would be fine but /* TWO */
should trigger an error.
It was then pointed out that this can also apply to other parameters based on the standard wording. For example, limit(0)
or offset(SIZE_MAX)
can make a resource that has data be considered “empty”. In particular, using <data.bin>
again:
#embed <data.bin> limit(0) if_empty("meow") /* THREE */
#embed <data.bin> if_empty("meow") limit(0) /* FOUR */
/* FOUR */
should issue a diagnostic since if_empty
is being evaluated before limit
turns the resource empty, while /* THREE */
would issue no diagnostics. This lead to the formulation of the following guidance:
offset
must appear beforelimit
.limit
and/oroffset
must appear before any ofprefix
,suffix
, orif_empty
.
We are asking implementations how they feel about the above 2 rules and implementing them.
To be extremely clear: offset
, clang::offset
, and gnu::offset
always apply before the standard limit(...)
parameter, both in Wording and in All Real Implementations, but do not impose an order.
To be more clear: this is not how C23 specified it, and not how C++ standardized it so far. As #embed
’s principle author and carrier through the last 7 years, nobody has really came forward to say this was confusing or harmful, but this may simply be selection bias or simply that nobody has spoken up.
We note that some of this is weird. Again, consider the case of /* FOUR */
before:
#embed <data.bin> if_empty("meow") limit(0) /* FOUR */
If <data.bin>
is an empty resource, would that mean the preceding if_empty
is fine because limit(0)
would not have any effect anyways? In an obvious sense, the diagnostic would apply anwyays but this is one of those things where I personally did not believe anyone would advocate for ordering requirements either way so now I feel like I have to ask if that’s a quality-of-implementation thing anyone would care about in the first place. This is, again, in the face of the fact that the order of the parameters does on all the implementations and that nobody has asked me both in the run-up to standardization and after if this should be a thing.
The Questions
Therefore, we’d like to poll the Clang community:
1. Does anyone think a diagnostic on the order will help prevent confusion with users, even if the semantics never change between invocations regardless of parameter order?
2. If the answer to (1) is yes, do we believe it should be a warning (recommended practice in Standard Speak) or an error (a Constraint Violation/Ill-Formed in Standard Speak)?
Sub-questions such as “an error, but only in pedantic mode” and similar can be golfed and bikeshedded after answering the first two questions.
A formalization of these semantics is going to be presented to WG21 and WG14 at some point. I’m gathering implementer feedback and willingness to change their existing implementations to formulate a new paper: P3731R0: #embed Preprocessor Parameter Order
Thank you for reading,
Björkus