Decide on when MIR Discriminant() operation is UB (original) (raw)

We do not currently have a clear description of what the semantics of the Discriminant() MIR operation, and the corresponding intrinsic (exposed via mem::discriminant()), are -- specifically, what are the safety preconditions of this operation, and when is it UB?

Note that this operation works on all types, not just enums. For valid values of non non-enum types it returns some valid integer value (currently, 0).

The implementation in Miri (to be restored with #91088) does the minimum amount of work necessary to determine the discriminant: if the type has no discriminant (since there are not at least 2 variants), the operation is always defined; otherwise it reads the tag (which encodes the discriminant) and causes UB if that is uninitialized or does not encode a valid discriminant. (There are some thorny question here around what happens if the discriminant has provenance; I would like to keep that out of scope for this issue -- it should likely be treated like a ptr-to-int transmute, whatever we end up doing with that: rust-lang/unsafe-code-guidelines#286.)

The codegen backend adds some extra UB for the case where the type is uninhabited:

/// Obtain the actual discriminant of a value.
pub fn codegen_get_discr<Bx: BuilderMethods<'a, 'tcx, Value = V>>(
self,
bx: &mut Bx,
cast_to: Ty<'tcx>,
) -> V {
let cast_to = bx.cx().immediate_backend_type(bx.cx().layout_of(cast_to));
if self.layout.abi.is_uninhabited() {
return bx.cx().const_undef(cast_to);
}

We also have a related MIR optimization in https://github.com/rust-lang/rust/blob/93542a8240c5f926ac5f3f99cef99366082f9c2b/compiler/rustc_mir_transform/src/uninhabited_enum_branching.rs. I am not quite sure what this does though, it seems to be more about assuming that if a particular enum variant is uninhabited then we will never see the discriminant for that variant, and can hence remove it from the SwitchInt?

An 'obvious' choice is to say that the value passed to the Discriminant operator must fully satisfy its validity invariant -- that would certainly justify both the MIR optimization and what the codegen backend does. However, this also has problems:

Drop elaboration generates Discriminant operations on partially moved-out-of enums (Safe function MIR reads discriminant of moved-out local #91029). Depending on the semantics of 'move' and whether validity invariants might take into account what a pointer points to (such as requiring that a Box be initialized), this might lead to calling Discriminant on invalid values.
To even check the validity invariant we need to read the full value, but there are legitimate situations where parts of that value are pointed to by an active mutable reference that makes uniqueness assumptions (Fix variant index / discriminant confusion in uninhabited enum branching #89764 (comment)). Doing a read violates those assumptions.

These observations make me doubtful that requiring full validity is the right thing. Making the fewest assumptions is appealing IMO, but not compatible with our codegen backend nor with the MIR optimizations -- the optimization seems to kick in even for operations of the form Discriminant(*ptr), so the validity invariant of ptr itself does not help either. It could be possible to strike some middle ground, but that feels like a rather ad-hoc adjustment to the current set of optimizations.

To summarize:

Miri implements "minimal UB", requiring just enough to actually be able to compute the discriminant. This is incompatible with what MIR optimizations and the codegen backend do.
A principled alternative would be requiring full validity of the value, but that is incompatible with code emitted for dropping, and with aliasing assumptions.
Some middle ground is probably possible but seems entirely ad-hoc.

Cc @wesleywiser @tmandry @rust-lang/wg-mir-opt