Storing an object as &Header, but reading the data past the end of the header · Issue #256 · rust-lang/unsafe-code-guidelines (original) (raw)
This is related to #2 but the read is not out of bounds of the allocation, not being written to by other threads, not the bytes of a &mut Blah
, etc. That is to say, really the code is trying to model a dynamically sized type, that for one reason or another does not support (Note that ther are a number of custom DST proposals).
So, I heard that it was UB for you to have a &T and read outside the bounds of that T, even if conceptually it's a totally in-bounds read. E.g. T
here may be a ZST, or it may be a header after which a trailing array is expected, or standing that sits at the head of a trailing array, or it may be a struct that's the common shared fields of some set of other struct... These are pretty common in unsafe code as it's a pattern which is both legal and useful in C and C++.
It's pretty common in Rust too:
- It's not unheard of in C apis to use a
#[repr(C)] struct Foo { _priv: [u8; 0] }
, as this is what bindgen uses. Some of these APIs then go on use&Foo
in the Rust code. (This is essentially a workaround for a lack of a stableextern Type
). This code doesn't read the data, so the only issue would be if we told LLVM it could assume things about the pointer that turn out to be untrue in a situation like cross-lang LTO, probably. - Similarly, I've seen other FFI code that used a
struct CStr([u8; 0])
for a similar purpose — as a version ofstd::ffi::CStr
that you can actually pass to C directly. (I even almost did this for ffi_support::FfiStr, but went with a pointer inside so I could easily check for code passing in null). bitvec
has aBitSlice
type which acts a lot like a slice that magically has bit-level indexing. Internally it's something likestruct BitSlice { _mem: [()] }
which lets it behave like an unsized type, The "pointer" and length are both specially encoded values that contain both the actual pointer/length as well as bit-level offsets for tracking where withing byte things are. There are a lot of reasons this might be illegal, but I had not thoughtmem::size_of_val
returning the wrong value was the actual one.anyhow::Error
internally wraps aBox<ErrorImpl<()>>
, whereErrorImpl<T>
contains a vtable, a backtrace, and then theT
.ErrorImpl<()>
is used as it behaves as the "common header" for real ErrorImpl values. On construction,Box<ErrorImpl<T>>
is converted toBox<ErrorImpl<()>>
, when stored in the Error.
Whenever a method is called that needs to delegate to the vtable, theBox<ErrorImpl<()>>
is converted into the right pointer type for the vtable function (one of&ErrorImpl<()>
,&mut ErrorImpl<()>
,Box<ErrorImpl<()>>
) which is called with that pointer. The first thing the vtable function generally does is convert the reference to e.g.&ErrorImpl<T>
, example: https://github.com/dtolnay/anyhow/blob/99c982128458fecb8d1d7aff9478dd77dac0ee3b/src/error.rs#L538-L545. (I had always kind of thought it wasn't okay to useBox<T>
here, but I'm surprised that stuff like&ErrorImpl<()>
to&ErrorImpl<RealType>
isn't okay either).wio-rs
containsVariableSizedBox
which provides this pattern in a library form, and IIUC is mostly intended for the flexible-array-member case. The API attempts to launder pointers to the object, which is... very non-obvious. It seems like it plausibly avoids the issue here, though, but it's insanely subtle, and if this is the recommended pattern, I suspect it will need a very good nomicon entry. https://github.com/retep998/wio-rs/blob/9bf021178b2d02485f1bd35e6cff41bf52d4a9a2/src/vsb.rs#L98-L113- I do something similar in
arcstr
, where there's a header and a variable length segment that trails it. I avoided issues here by luck, as I took great care to avoid ever putting the inner type behind a reference. This was lucky since I wasn't aware of this at all, and did it for other reasons. This was painful as it required field hard-coding offsets. - This isn't to say anything of the numerous C or C++ apis which expose polymorphism in this way — In c++ this is how single non-virtual inheritance is represented, so it's especially common, although it was common in C too. Additionally, C code with a flexible array member is in tons of places, and not just windows APIs.
This is just a few off the top — there's a lot of unsafe code that does this. Personally, I had thought it was allowed so long as you don't go past the actual bounds of the allocation, it makes some sense that it's not though, unfortunately. (Somehow, I don't think I've ever had miri trouble me about it, but it's seeming like it's just because of luck && coincidence more than anything else).
Anyway, I think if this is UB we should start being way more vocal about it, because it's a totally legal pattern in C and C++, and common.