Make slice iterators carry only a single provenance by scottmcm · Pull Request #122971 · rust-lang/rust (original) (raw)

That particular test is here because it actually works already, not because it's particularly likely. The better tests are those mentioned in llvm/llvm-project#86417

Basically, when it matters most is when you're storing an iterator in your own type, rather than just doing myslice.iter().bunch().of().stuff().

For example, I was talking with @saethlin a while ago about rustc's MemDecoder. The reason it currently uses unsafe code is because we couldn't find a way to do things optimally with normal slices or slice iterators.

The exemplar of the problem is basically this function:

fn read_u32(d: &mut MemDecoder) -> Option;

That's trivially written over it storing a slice (https://rust.godbolt.org/z/vW95ebW7j) but it works poorly to store a slice in it for exactly the same reasons that the slice iterators don't: it has to update both the pointer and the length when you step forward.

So ok, what if you change it to store an iterator, since those have already fixed this problem?

Well, if you do the obvious change (to https://rust.godbolt.org/z/aWcbGvnzK), then it still writes twice: it ends up writing back both the start and end pointers, because by using the slice helper -- after all, there's no equivalent iterator helper -- it actually changes the provenance of the end pointer, as far as the optimizer knows (proof that LLVM is definitely not allowed to optimize it: https://alive2.llvm.org/ce/z/jifMAC).

But by giving the iterator only a single provenance, then LLVM becomes allowed to optimize out things like that (Alive proof https://alive2.llvm.org/ce/z/R327wi).

And unfortunately nikic says https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/Communicating.20same-provenance.20to.20LLVM/near/425757728 there's no good way to tell LLVM they have the same provenance.

So if we want iter <-> slice to be actually zero-cost, we either need to do something like this so there's only one provenance, or find a way to tell LLVM about it.

Note that p + (q - p) does actually optimize out in the assembly generation part of LLVM (https://llvm.godbolt.org/z/e3Yrd7WzK), just not in the middle-end.