Improve ptr::read code in debug builds · Issue #81163 · rust-lang/rust (original) (raw)
In #80290, some people raised concerns about the quality of the code that ptr::write
compiles to in debug builds. Given that, to my knowledge, reads are much more common than writes, I would think that one should be much more concerned with the code that ptr::read
compiles to -- and currently, there's going to be quite a few function calls in there, so without inlining, that code will be pretty slow.
ptr::read
could be improved with techniques similar to what I did for ptr::write
(call intrinsics directly, and inline everything else by hand). This would result in (something like) the following implementation: (EDIT see below for why this is wrong)
pub const unsafe fn read(src: *const T) -> T {
// We are calling the intrinsics directly to avoid function calls in the generated code
// as intrinsics::copy_nonoverlapping
is a wrapper function.
extern "rust-intrinsic" {
fn copy_nonoverlapping(src: *const T, dst: *mut T, count: usize);
}
// For the same reason, we also side-step `mem::MaybeUninit` and use a custom `union` directly.
#[repr(C)]
union MaybeUninit<T> {
init: mem::ManuallyDrop<T>,
uninit: (),
}
let mut tmp: MaybeUninit<T> = MaybeUninit { uninit: () };
// SAFETY: the caller must guarantee that `src` is valid for reads.
// `src` cannot overlap `tmp` because `tmp` was just allocated on
// the stack as a separate allocated object.
//
// `MaybeUninit` is repr(C), so we can assume `init` is at offset 0, justifying the pointer
// casts.
//
// Finally, since we just wrote a valid value into `tmp`, it is guaranteed
// to be properly initialized.
unsafe {
copy_nonoverlapping(src, &mut tmp as *mut _ as *mut T, 1);
mem::transmute_copy(&tmp.init)
}
}
However, here we have the extra difficulty that read
is (unstably) a const fn
, so the above implementation is rejected. &tmp.init
can be replaced by &mut tmp.init
and that works (or we wait for a bootstrap bump so we can make use of #80418), but transmute_copy
is non-const
, so there's still more work to be done. (transmute
does not work since the compiler does not recognize that T
and ManuallyDrop<T>
have the same size.)
I will stop here, but if someone else strongly cares about ptr::read
performance/codesize in debug builds, feel free to pick this up and drive it to completion.