change equate for binders to not rely on subtyping by spastorino · Pull Request #118247 · rust-lang/rust (original) (raw)

summary by @spastorino and @lcnr

Context

The following code:

type One = for<'a> fn(&'a (), &'a ()); type Two = for<'a, 'b> fn(&'a (), &'b ());

mod my_api { use std::any::Any; use std:📑:PhantomData;

pub struct Foo<T: 'static> {
    a: &'static dyn Any,
    _p: PhantomData<*mut T>, // invariant, the type of the `dyn Any`
}

impl<T: 'static> Foo<T> {
    pub fn deref(&self) -> &'static T {
        match self.a.downcast_ref::<T>() {
            None => unsafe { std::hint::unreachable_unchecked() },
            Some(a) => a,
        }
    }
    
    pub fn new(a: T) -> Foo<T> {
       Foo::<T> {
            a: Box::leak(Box::new(a)),
            _p: PhantomData,
        } 
    }
}

}

use my_api::*;

fn main() { let foo = Foo::::new((|_, _| ()) as One); foo.deref(); let foo: Foo = foo; foo.deref(); }

has UB from hitting the unreachable_unchecked. This happens because TypeId::of::<One>() is not the same as TypeId::of::<Two>() despite them being considered the same types by the type checker.

Currently the type checker considers binders to be equal if subtyping succeeds in both directions: for<'a> T<'a> eq for<'b> U<'b> holds if for<'a> exists<'b> T<'b> <: T'<a> AND for<'b> exists<'a> T<'a> <: T<'b> holds. This results in for<'a> fn(&'a (), &'a ()) and for<'a, 'b> fn(&'a (), &'b ()) being equal in the type system.

TypeId is computed by looking at the structure of a type. Even though these types are semantically equal, they have a different structure resulting in them having different TypeId. This can break invariants of unsafe code at runtime and is unsound when happening at compile time, e.g. when using const generics.

So as seen in main, we can assign a value of type Foo::<One> to a binding of type Foo<Two> given those are considered the same type but then when we call deref, it calls downcast_ref that relies on TypeId and we would hit the None arm as these have different TypeIds.

As stated in #97156 (comment), this causes the API of existing crates to be unsound.

What should we do about this

The same type resulting in different TypeIds is a significant footgun, breaking a very reasonable assumptions by authors of unsafe code. It will also be unsound by itself once they are usable in generic contexts with const generics.

There are two options going forward here:


Advantages of the first approach:

General thoughts:

Advantages of the second approach:


This PR goes with the second approach. A crater run did not result in any regressions. I am personally very hesitant about trying the first approach due to the above reasons. It feels like there are more unknowns when going that route.

Changing the way we equate binders

Relating bound variables from different depths already results in a universe error in equate. We therefore only need to make sure that there is 1-to-1 correspondence between bound variables when relating binders. This results in concrete types being structurally equal after anonymizing their bound variables.

We implement this by instantiating one of the binder with placeholders and the other with inference variables and then equating the instantiated types. We do so in both directions.

More formally, we change the typing rules as follows:

for<'r0, .., 'rn> exists<'l0, .., 'ln> LHS<'l0, .., 'ln> <: RHS<'r0, .., 'rn>
for<'l0, .., 'ln> exists<'r0, .., 'rn> RHS<'r0, .., 'rn> <: LHS<'l0, .., 'ln>
--------------------------------------------------------------------------
for<'l0, .., 'ln> LHS<'l0, .., 'ln> eq for<'r0, .., 'rn> RHS<'r0, .., 'rn>

to

for<'r0, .., 'rn> exists<'l0, .., 'ln> LHS<'l0, .., 'ln> eq RHS<'r0, .., 'rn>
for<'l0, .., 'ln> exists<'r0, .., 'rn> RHS<'r0, .., 'rn> eq LHS<'l0, .., 'ln>
--------------------------------------------------------------------------
for<'l0, .., 'ln> LHS<'l0, .., 'ln> eq for<'r0, .., 'rn> RHS<'r0, .., 'rn>

Fixes #97156

r? @lcnr