repr(int) fieldless enums are ABI-compatible with int by RalfJung · Pull Request #128600 · rust-lang/rust (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation32 Commits2 Checks6 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Given that we already allow mixing char and u32, as well as usize/isize and the corresponding fixed-size type, it seems odd to not also do this for enums. However, if we declare this legal, then Rust code may start relying on it, so arguably CFI should also allow this, which could miss some real bugs where the mismatch is accidental.

So we have to decide -- do we say this is ABI-compatible, or do we declare this to be "erroneous behavior": similar to overflow in (non-wrapping) arithmetic, such behavior is always a bug, but implementations are not required to detect the bug (with a panic or abort), they can also continue execution in some well-defined way. (For the ABI question, that well-defined way is to transmute the argument from the caller type to the callee type. This is well-defined in the sense that we say exactly what happens, but it may be UB if the caller uses e.g. u32 and the callee expects a repr(u32) enum for which the chosen argument value is not valid.)

I am not sure if we are already guaranteeing anything like this somewhere. (The nomicon talks about these repr flags affecting the ABI, but the nomicon is not normative, and also it's not entirely clear whether ABI here also includes function call concerns.)
Cc @rust-lang/lang @rust-lang/opsem

This is part of a wider set of discussions around CFI vs ABI compatibility:

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot added S-waiting-on-review

Status: Awaiting review from the assignee but also interested parties.

T-compiler

Relevant to the compiler team, which will review and decide on the PR/issue.

T-libs

Relevant to the library team, which will review and decide on the PR/issue.

labels

Aug 3, 2024

The Miri subtree was changed

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

@maurer do you have any concerns with this guarantee from a CFI perspective? Does CFI accept or reject calls where caller and callee disagree on whether to use a repr(iN) enum or the corresponding integer type?

Your change as it stands won't break anything, but it also won't make CFI across ABI-compatible enums succeed, so it would have the downside that there would be more behavior that is "legal Rust but not CFI legal", which we generally want to minimize as much as possible.

This does go against how CFI works in C/C++ - there, it's not legal to perform the function pointer casts you're legalizing here, so it would be different. It also does lessen the protections somewhat, as distinct enums with the same #[repr] will not be considered different type sfor defense purposes.

The tag for fn foo(x: MyEnum);, will currently encode MyEnum as an opaque type, so it'll be the equivalent to the C signature void (*)(MyEnum); and be processed that way.

If we want this ABI compatibility to extend to CFI, we should add a case to the transform which changes #[repr({u,i}N)] enums into {u,i}N during the type transformation before consumption.

Thanks!

Given that I've not touched the CFI code before, I'd prefer if you could take care of that -- either as a parallel PR (I could then cherry-pick your patches) or as a follow-up.

OK. This does technically weaken CFI, as it means that in this case:

#[repr(u8)]
enum A {
  X = 0,
}

#[repr(u8)]
enum B {
  Y = 1,
}

fn foo(x: A) {
// ...
}
fn bar(x: B) {
// ...
}

fn baz(f: fn(x: A)) {
  f(A::X)
}

If an attacker got control over the baz parameter, they could send it to bar instead of foo. Prior to this change, CFI could prevent this.

I'm a firm believer that CFI should only enforce the language rules, so if we're going to flatten enums like this for purposes of function pointer casts, we'll need to weaken CFI like this, but I do want to point out that it will weaken CFI in Rust. It might weaken it enough that it might make sense to have this behavior flagged.

cc @rcvalle to avoid any surprise conflicts when it comes to implementation.

The tl;dr is that Ralf is legalizing this cast:

#[repr(i8)]
enum A {}
#[repr(i8)]
enum B {X = 0}
fn f(x: fn(A)) {
   let y: fn(B) = unsafe { std::mem::transmute(x) }
   y(B::X)
}

and we'd have to weaken (possibly behind a flag) the Rust CFI support so that it doesn't trigger on it.

Without my CFI hat on, what is your plan for the semantics of such casts in the event that they receive an out-of-range value for the enum?

I'm a firm believer that CFI should only enforce the language rules, so if we're going to flatten enums like this for purposes of function pointer casts, we'll need to weaken CFI like this, but I do want to point out that it will weaken CFI in Rust. It might weaken it enough that it might make sense to have this behavior flagged.

Right, that's why this is flagged for t-lang discussion and it is why I asked for your input for that discussion. :) We could also declare this to be "erroneous behavior", where we say that if this happens it is considered a bug, but detecting the bug is not required by implementations. If the bug is not detected, the call proceeds as-if there was a transmute.

@chorman0773 suggested this should be documented as guaranteed; not sure if they have a concrete usecase in mind or whether it just seems "obvious". Note that we already declare NonZeroI32 and i32 to be compatible which can also lead to UB when someone passes zero that way.

Without my CFI hat on, what is your plan for the semantics of such casts in the event that they receive an out-of-range value for the enum?

That's already documented as part of the general ABI compatibility rules: it acts like a transmute, so for out-of-range values the call is still UB.

As with char and u32 we could add this to the integer normalization option for CFI, but for non extern "C" functions only; otherwise, it'd break cross-language CFI support. @RalfJung does the ABI provide any guarantees about cross-language compatibility of those?

Shouldn't x-lang be a non-issue, as those all need to be #[repr(C)] if the enum is present in C, which is excluded from this flattening?

The ABI compatibility rules apply to all ABIs, including extern "C". So this should be accepted for extern "C" as well if the PR lands. (And the existing compatible pairs like char-u32 or i32-NonZeroI32, should already be accepted over extern "C".)

does the ABI provide any guarantees about cross-language compatibility of those?

AFAIK we guarantee that i32 is ABI-compatible with int32_t etc, though I do not know where this is documented exactly. By transitivity this makes a repr(i32) enum (and NonZeroI32) ABI-compatible with int32_t (for any ABI string).

@RalfJung would, for example, #[repr(C, u8)] and #[repr(u8)] be ABI compatible?

@RalfJung would, for example, #[repr(C, u8)] and #[repr(u8)] be ABI compatible?

That is an odd combination. The docs say

For field-less enums, the C representation has the size and alignment of the default enum size and alignment for the target platform's C ABI.

but for repr(C, u8) this can't be right. So seems to me like the C is basically a NOP here so I would not expect it to affect layout or ABI in any way. (We are only talking about fieldless enums here.)

If #[repr(C, u8)] is legal, I think it would be correct to treat the C as a suppression indicator for this - if the other language is C++, two distinct enum types which have the same representation are still assumed non-aliasing when behind a pointer, so flattening them to make them compatible seems weird.

I don't see how the C/C++ pointer aliasing rules should be relevant here. We aren't even talking about pointer types here, and furthermore Rust deliberately does not want C/C++-style "strict aliasing" rules, so Rust-to-Rust calls should not be subject to such rules even when they go via extern "C". (Rust also already declares all pointer types to be mutually ABI compatible.)

This PR is entirely about by-values passing of enums, let's stick to that in the discussion here.

but for repr(C, u8) this can't be right. So seems to me like the C is basically a NOP here so I would not expect it to affect layout or ABI in any way. (We are only talking about fieldless enums here.)

If that is the case, I think we could use @maurer's suggestion and add it to the integer normalization option for CFI for non #[repr(C)] enums only.

I said the C should be a NOP, i.e. both repr(u8) and repr(C, u8) should be compatible with u8 if the enum is fieidless. So, they should both be integer normalized.

Aliasing is relevant here because CFI is essentially about labeling function pointers based on aliasing rules.

If I make a function pointer out of a Rust extern "C" fn foo(x: SomeEnum) and SomeEnum is flattened despite being #[repr(C)], and pass the result to C++, and C++ expects that function pointer to take a SomeEnum, when C++ tries to call in, it'll fail.

I'm not suggesting we add strict-aliasing semantics to Rust, but make it possible for Rust to communicate with a language which assumes it.

Could we filter out by #[repr(C], and non #[repr(C, X)] and fieldless? Is #[repr(C, X)] and fieldless expected to be used cross-language (i.e., with FFI or across the FFI boundary)?

@RalfJung I was mostly asking a general question as I was formalizing the abi compat rules - whether or not #[repr(Int)] enum Foo { Var0 = 1, ...} (where Foo has no fields) is #[repr(C)] struct Foo(Int);, #[repr(transparent)] struct Foo(Int); or something else entirely for abi purposes is unclear.
My more general use is compatibility between a fieldless enum and the appropriate C/C++ enum type, which I do rely on when writing C++ bindings, and currently writing my crabi++ library in C++. I believe that, except for CFI, this matches the underlying type on all abis I'm aware of (in particular, it doesn't have aggregate abi on arm, where that matters).

The implementation looks fine, though I'm also curious on the answer to

Without my CFI hat on, what is your plan for the semantics of such casts in the event that they receive an out-of-range value for the enum?

On the compiler side, we seem to separate the caller ty and the callee ty, so it would be possible to make this one-direction only. (i.e. legalize only fn(i32) casted to fn(A) where A is repr(i32) but not the other way around)

The implementation looks fine, though I'm also curious on the answer to

I have already answered this above and it is answered in the stable docs:

If the declared signature and the signature of the function pointer are ABI-compatible, then the function call behaves as if every argument was transmuted from the type in the function pointer to the type at the function declaration, and the return value is transmuted from the type in the declaration to the type in the pointer. All the usual caveats and concerns around transmutation apply; for instance, if the function expects a NonZero and the function pointer uses the ABI-compatible type Option, and the value used for the argument is None, then this call is Undefined Behavior since transmuting None:: to NonZero violates the non-zero requirement.

Could we filter out by #[repr(C], and non #[repr(C, X)] and fieldless? Is #[repr(C, X)] and fieldless expected to be used cross-language (i.e., with FFI or across the FFI boundary)?

I don't understand what you mean by "filter out".

However, when I try to use this repr I can an error:

error[E0566]: conflicting representation hints
  --> tests/pass/function_calls/abi_compat.rs:83:12
   |
LL |     #[repr(C, u64)]
   |            ^  ^^^
   |
   = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
   = note: for more information, see issue #68585 <https://github.com/rust-lang/rust/issues/68585>
   = note: `#[deny(conflicting_repr_hints)]` on by default

So I think we do not have to worry about repr(C, u64).

My more general use is compatibility between a fieldless enum and the appropriate C/C++ enum type

That seems like something to be added to some sort of FFI documentation explaining our cross-language ABI compatibility rules.

If I make a function pointer out of a Rust extern "C" fn foo(x: SomeEnum) and SomeEnum is flattened despite being #[repr(C)], and pass the result to C++, and C++ expects that function pointer to take a SomeEnum, when C++ tries to call in, it'll fail.

Isn't this a problem with all kinds of "normalization", e.g. int vs int32_t being the same type in Rust but potentially considered different types in C++ that must not be mixed up across function call boundaries? Or am I misunderstanding something?

Marking as waiting on team.

I'm in favor of #[repr(int)] enum being ABI/CFI compatible with int. Imo if transmute-by-abi is allowed for #[repr(transparent)], it only makes sense to also allow it for fieldless #[repr(int)]. You can even make a decent argument that dataful #[repr(int)] have their ABI structurally defined as #[repr(C)] union per the repr RFC.

C says that enumerated types are compatible with their underlying type. That would suggest that we should make repr(int) ABI/CFI compatible with int. However, C's concept of type compatibility is not transitive. So C allows the declaration void f(enum E) to link to the definition void f(int), but not to the definition void f(enum Q).

I'm also not sure exactly how compatibility is determined between C and Rust. Do we ignore the tag equivalence requirement so that enum E can be compatible with enum _RNtC3lib1E? Since Rust doesn't have #[no_mangle] for types, the only answer I can come up with is that Rust uses a separate tag namespace from C and a tag in one namespace is compatible with all tag names1 in the other namespace.

At an absolute minimum, C enum needs to be CFI compatible with Rust iN. Given a C enum that is used as bitflags, the proper translation into Rust isn't #[repr(int)] enum E { … } but #[repr(transparent)] struct E(int); so that all values are valid, not just the enumerated ones.

Footnotes

Note that I'm using tag name to refer to only the name; types must also be structurally compatible to be considered compatible. ↩

traviscross added the T-lang

Relevant to the language team, which will review and decide on the PR/issue.

label

Aug 8, 2024

Labels

S-waiting-on-team

Status: Awaiting decision from the relevant subteam (see the T- label).

T-compiler

Relevant to the compiler team, which will review and decide on the PR/issue.

T-lang

Relevant to the language team, which will review and decide on the PR/issue.

T-libs

Relevant to the library team, which will review and decide on the PR/issue.