What about: Pointer-to-integer transmutes? · Issue #286 · rust-lang/unsafe-code-guidelines (original) (raw)
Transmuting pointers to integers (i.e., not going through the regular cast) is a problem. This is demonstrated by the following silly example:
fn example(ptr: *const i32, cmp: usize) -> usize { unsafe {
let mut storage: usize = 0;
*(&mut storage as *mut _ as *mut *const i32) = ptr; // write at ptr type
let val = storage; // read at int type (0)
storage = val; // redundant write back (1)
external_function(&storage); // just making sure the value in storage
can be observed
if val == cmp {
return cmp; // could exploit integer equivalence (2)
}
return 0;
} }
Imagine executing this code on the Abstract Machine, taking into account that pointers have provenance, i.e., a ptr-to-int conversion loses information. Now what happens at point (0)? Here we read the data stored in storage
at type usize
. That data however is the ptr ptr
, i.e., it has provenance. What should happen with that provenance at (0)?
- We could drop the provenance. That would basically mean that the load of
storage
acts like an implicit ptr-to-int cast. The problem with this approach is that we cannot remove the redundant write at (1): the value inval
is different from what is stored instorage
, sinceval
has no provenance but theptr
stored instorage
does! This is basically another version of https://bugs.llvm.org/show_bug.cgi?id=34548: ptr-to-int casts are not NOPs, and a ptr-int-ptr roundtrip cannot be optimized away. If a load, like at (0), can perform a ptr-to-int cast, now the same concerns apply here. - We could preserve the provenance. Then, however, we end up with
val
having typeusize
and also having provenance, which is a big problem: the compiler might decide, at program point (2), toreturn val
instead ofreturn cmp
(based on the fact thatval == cmp
), but ifval
could have provenance then this transformation is wrong! This is basically the isue at the heart of my blog post on provenance:==
ignores provenance, so just because two values are equal according to==
does not mean they can be used interchangeably in all circumstances. - What other option is there? Well, we might make the load return
poison
-- effectively declaring ptr-to-int transmutes as UB.
The last option is what is being proposed to LLVM, along with a new "byte" type such that loading at type bN
would preserve provenance, but loading at type iN
would turn bytes with provenance into poison
. On the flipside, no arithmetic or logical operations are possible on bN
; that type represents "opaque bytes" with the only possible operations being load and store (and explicit casts to remove any provenance that might exist). This leads to a consistent model in which both redundant store elimination and GVN substitution on integer types (the optimizations mentioned above) are possible. I don't know any other way to resolve the contradiction that otherwise arises from doing both of these optimizations. However, the LLVM discussion is still in its early stages, and there were already a lot of responses that I have not read in detail yet. If this ends up being accepted, we on the Rust side will have to figure out if and how we can make use of the new "byte" type and its explicit casts (to pointers or integers).
This thread is about discussing how we need to restrict ptr-to-int transmutes when pointers have provenance but integers do not. See #287 for a discussion with the goal of avoiding provenance in the first place.