Commit to safety rules for dyn trait upcasting · Issue #101336 · rust-lang/rust (original) (raw)

This issue proposes a resolution to the last outstanding question blocking dyn upcast. It is also available on the initiative repository.

Background

We are trying to enable "upcasts" from a dyn Trait to its supertraits:

trait Foo { fn foo_method(&self); } trait Bar: Foo { fn bar_method(&self); }

let x: &dyn Bar = /* ... */; let y: &dyn Foo = x; // compiles

The key factor for these upcasts is that they require adjusting the vtable. The current implementation strategy is that the vtables for the Bar trait embed pointers to a Foo vtable within them:

+----------------------------+
| Bar vtable for some type T |
|----------------------------|
| Foo vtable                 | ----------> +------------------+
| foo_method                 | -----+      | Foo vtable for T |
| bar_method                 | --+  |      |------------------|
+----------------------------+   |  |      | foo_method       | ---+
                                 |  |      +------------------+    |
                                 |  |                              |
                                 |  +---> <T as Foo>::foo_method <-+
                                 v
                     <T as Bar>::bar_method
                 
(this diagram is only meant to convey the general idea of the vtable
 layout, and doesn't represent the exact offsets we would use etc;
 in fact, with the current implementation,
 the first supertrait is stored 'inline' and hence no load is required)

This way, given a &dyn Bar object, we convert its Bar vtable to the appropriate Foo vtable by loading the appropriate field.

Although we don't want to commit to a particular implementation strategy, we do want to leave room for this strategy. One implication is that performing an upcast may require loading from the vtable, which implies that the vtable must be a valid pointer to an actual Rust vtable. Although &dyn Bar references would always contain a valid vtable, the same is not necessarily true for a raw pointer like *const dyn Bar or *mut dyn Bar.

In the language today, we only support "noop upcasts" that don't affect the vtable, and these are safe (e.g., converting from *const dyn Foo + Send to *const dyn Foo). If we extend the set of upcasts to permit vtable-adjusting upcasts, like *const dyn Bar to *const dyn Foo, this implies that, for safe code at least, all *const dyn Trait values must have a valid vtable, so that we know we can safely load the required field and perform the upcast.

On the other hand, we do not generally require raw *mut T pointers to point to valid data. In fact, we explicitly permit them to have any value, including null, and only require that they point to valid data when they are dereferenced. Because dereferencing a raw pointer is an unsafe operation, it has always been considered safe to expose an arbitrary raw pointer to unsafe code -- the unsafety arises when you take a raw pointer from an unknown source and dereference it, since unless you can trace the origin of that pointer you can't possible guarantee that it is valid to dereference.

This brings us to the conflict:

There have been requests to extend traits with the option to include raw pointer methods:

trait PtrLike { fn is_null(v: *const Self) -> bool; }

These methods would be useful when writing unsafe code because having an &self method requires satisfying the validity conditions of an &-reference, which may not be possible. If we did have such methods, however, it raises the question of whether it would be safe to invoke is_null on a *const dyn PtrLike reference. Just as with upcasting, invoking a method from the vtable requires loading from the vtable, and hence requires a valid vtable generated by the compiler.

The solution we propose in this document also resolves this future dilemma.

Definitions: validity vs safety invariant

We adopt the terms validity and safety invariant from the unsafe code guidelines:

The validity invariant is an invariant that all data must uphold any time it is accessed or copied in a typed manner. This invariant is known to the compiler and exploited by optimizations such as improved enum layout or eliding in-bounds checks.

The safety invariant is an invariant that safe code may assume all data to uphold. This invariant is used to justify which operations safe code can perform. The safety invariant can be temporarily violated by unsafe code, but must always be upheld when interfacing with unknown safe code. It is not relevant when arguing whether some program has UB, but it is relevant when arguing whether some code safely encapsulates its unsafety -- in other words, it is relevant when arguing whether some library is sound.

In short, the validity invariant defines a condition that must always be true, even in unsafe code, and the safety invariant defines an invariant that unsafe code must guarantee before a value can be released to be used by arbitrary code.

Contours of the solution space

We can fix this conflict in one of two basic ways:

First, we could make vtable-adjusting upcasts casts (and *Self method calls) unsafe. This is difficult to implement and would require changes to the Coerce trait, which is already excessively complicated. In exchange, it offers at best marginal benefit: raw *dyn pointers can be released to safe code, but safe code can't do anything interesting with them. For this reason, we do not recommend this option.

If vtable-adjusting casts (and *Self method calls) are safe, then the safety invariant for *dyn types must be that their metadata points to a fully valid vtable (i.e., a vtable created by the compiler). This ensures safe code can perform upcasts or dynamic dispatch. This also implies that std::ptr::null (which is safe) cannot be extended to T where T: ?Sized unless further changes are made, since we would need to provide a valid vtable (it would be possible to permit a sentinel value, like null, to be used, but that would imply that upcasting must have a branch, making it less efficient).

There are, however, various options for the validity invariant, ranging from no invariant to requiring a fully valid vtable at all times. The strict invariants offer some benefits, such as the ability to have a niche for *dyn pointers. We survey the options here:

Validity invariant for *dyn metadata Supports niche Can be initialized with std::mem::zeroed Constant initializer
None
Word-aligned
Word-aligned, non-null
Valid vtable

Explanations for the column titles:

Other points:

Proposal

The proposal is as follows:

Vtable-adjusting upcasts are defined as:

This approach...

The rules also imply that...

Possible future changes

It may be possible to weaken the validity or safety invariants later, but we risk finding that people have written unsafe code that relies on them. For example, people could build data structures using unions that assume that the vtable pointer is non-null. If we then later permitted a null value, this code could create UB. This is particularly problematic for changes to the safety invariant, since the assumption is that one can take a value which meets the safety invariant and give it to arbitrary code without creating UB (if the value only meets the validity invariant, and not the safety invariant, then you are supposed to audit and control all the code which uses that value, so this is less of a problem).