Improve ptr::read code in debug builds · Issue #81163 · rust-lang/rust (original) (raw)

In #80290, some people raised concerns about the quality of the code that ptr::write compiles to in debug builds. Given that, to my knowledge, reads are much more common than writes, I would think that one should be much more concerned with the code that ptr::read compiles to -- and currently, there's going to be quite a few function calls in there, so without inlining, that code will be pretty slow.

ptr::read could be improved with techniques similar to what I did for ptr::write (call intrinsics directly, and inline everything else by hand). This would result in (something like) the following implementation: (EDIT see below for why this is wrong)

pub const unsafe fn read(src: *const T) -> T { // We are calling the intrinsics directly to avoid function calls in the generated code // as intrinsics::copy_nonoverlapping is a wrapper function. extern "rust-intrinsic" { fn copy_nonoverlapping(src: *const T, dst: *mut T, count: usize); }

// For the same reason, we also side-step `mem::MaybeUninit` and use a custom `union` directly.
#[repr(C)]
union MaybeUninit<T> {
    init: mem::ManuallyDrop<T>,
    uninit: (),
}

let mut tmp: MaybeUninit<T> = MaybeUninit { uninit: () };
// SAFETY: the caller must guarantee that `src` is valid for reads.
// `src` cannot overlap `tmp` because `tmp` was just allocated on
// the stack as a separate allocated object.
//
// `MaybeUninit` is repr(C), so we can assume `init` is at offset 0, justifying the pointer
// casts.
//
// Finally, since we just wrote a valid value into `tmp`, it is guaranteed
// to be properly initialized.
unsafe {
    copy_nonoverlapping(src, &mut tmp as *mut _ as *mut T, 1);
    mem::transmute_copy(&tmp.init)
}

}

However, here we have the extra difficulty that read is (unstably) a const fn, so the above implementation is rejected. &tmp.init can be replaced by &mut tmp.init and that works (or we wait for a bootstrap bump so we can make use of #80418), but transmute_copy is non-const, so there's still more work to be done. (transmute does not work since the compiler does not recognize that T and ManuallyDrop<T> have the same size.)

I will stop here, but if someone else strongly cares about ptr::read performance/codesize in debug builds, feel free to pick this up and drive it to completion.

Cc @bjorn3 @therealprof @usbalbin