Autodiff batching2 by ZuseZ4 · Pull Request #139351 · rust-lang/rust (original) (raw)

@oli-obk I'm almost done with features, but there are two paths forward here, so I'd appreciate some help with the design.

The *v variants (dupv, dualvonly) allow better vectorization, by accepting larger shadow arguments.
Each shadow of a slice &[type] is supposed to be width * num_elements_of_primal_slice * byte_sizeof(type) bytes large.
We currently don't support generics but we should keep them in mind, and we already support aliases.

If you look at rustc_codegen_llvm you'll see that I hardcoded the byte_size_of(type) to 4, since my tests use floats.
An upstream version of course needs to figure that out more reliably.

In my typetree work (the only part I have not upstreamed from my fork) I have a little bit of logic here to handle them, but I haven't used it yet to figure out the byte size, so I'm not sure if that's legal: https://github.com/EnzymeAD/rust/blob/322f2226c1f672c9b5e934b15d255ae0d66bd0e2/compiler/rustc_middle/src/ty/typetree.rs#L196

If you say it's too hard for now, I could merge a workaround which analyzes the types in the ast frontend, which wouldn't support aliases or generics, but at least could handle &[f32] vs &[f64]. It's getting late for me so I might miss something obvious, but I feel like we should be able to figure out the size in rustc_monomorphize to handle more than that.

Also, there are reasons due to which a user might specify a larger stride than what I'd compute by default,
so I'll allow users under all combinations to provide an extra integer after *v arguments, which would replace whatever we computed here. But this way they could easily index out of bounds, so I'll mark generated functions in that case as unsafe. Once we figured out the part above, I'll add the code and tests for this to clarify it.