widening_mul by strake · Pull Request #2417 · rust-lang/rfcs (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

widening_mul #2417

Conversation40 Commits5 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

le-jzr, JustAPerson, ActuallyaDeviloper, novacrazy, birkenfeld, zayenz, kennytm, simonlindholm, est31, AaronKutch, and 19 more reacted with thumbs up emoji

I think we could come up with a better name for the method, but otherwise this is something I'd very much like to see added. Given that on some architectures (x86 to name one), multiplication always produces result twice the width of the operands, it seems rather inefficient to not provide a way to use the higher half.

@le-jzr We can also choose an efficient-as-possible implementation on architectures where it doesn't and save users the bother.

Some name ideas:

carrying_mul
double_mul
double_wide_mul

For reference, here is the generic definition from my u2size crate:

#[inline(always)] fn carrying_mul(x: usize, y: usize) -> (usize, usize) { let n = 0usize.count_zeros() >> 1; let halves = |x| (x >> n, x & (!0 >> n)); let ((x1, x0), (y1, y0)) = (halves(x), halves(y)); let (z2, z0) = (x1 * y1, x0 * y0); let z1 = (x1 + x0) * (y1 + y0) - z2 - z0; let (w, c) = usize::overflowing_add(z0, z1 << n); (z2 + z1 >> n + c as u32, w) }

The emitted machine code for this is quite heinous.

Having never heard of this operation before, but immediately understanding why it would be useful, I think the name needs to have some form of the word "wide" in it.

carrying_mul sounds as if regular multiplication doesn't "carry the 1", which would just be inaccurate. double_mul sounds like multiplying twice, or multiplying by two, which are totally different operations. double_wide_mul I'd be fine with, but is probably overkill unless there's a quadruple_wide_mul we need to contrast it with. Finally, std already has wrapping_mul and overflowing_mul methods on these types, so for naming consistency I would call it widening_mul.

ActuallyaDeviloper, eddyb, kennytm, gbutler69, wesleywiser, JustAPerson, Alexhuszagh, fintelia, burdges, Tom-Phinney, and 2 more reacted with thumbs up emoji

After a bit of though, I see the logic behind widening_mul. It could work.

@strake I know it's not really the point, but couldn't you do something like this?

#[inline(always)] fn carrying_mul(x: usize, y: usize) -> (usize, usize) { let usize_bits = 0usize.count_zeros(); if usize_bits <= 32 { let z = (x as u64) * (y as u64); ((z >> usize_bits) as usize, z as usize) } else if usize_bits <= 64 { let z = (x as u128) * (y as u128); ((z >> usize_bits) as usize, z as usize) } else { // fallback let n = usize_bits >> 1; let halves = |x| (x >> n, x & (!0 >> n)); let ((x1, x0), (y1, y0)) = (halves(x), halves(y)); let (z2, z0) = (x1 * y1, x0 * y0); let z1 = (x1 + x0) * (y1 + y0) - z2 - z0; let (w, c) = usize::overflowing_add(z0, z1 << n); (z2 + z1 >> n + c as u32, w) } }

Maybe instead of returning a tuple, it would make sense to return a struct? So e.g. u32::carrying_mul() would return a carried_product<u32>. This struct would have high and low fields. This makes it hard to confuse the fields, which would otherwise be an error-prone and hard to lint against. (Feel free to bikeshed on all the names here.)

I have only seen this called "widening multiply", but mostly in low-level (compiler/ISA) contexts.

I do wonder why this is a Mul impl on u2size rather than a mul of two usize values that returns a u2size (or equivalent such as a 2-tuple of usize, which I'd prefer over a new type here).

The broader rust project already has this in compiler-builtins (https://github.com/rust-lang-nursery/compiler-builtins/blob/master/src/int/mul.rs#L7), so I'd personally be happy to have it in core instead.

A quick playground experiment shows the ASM is simpler for returning (low, high) with u64. (For smaller types it didn't matter.) Conveniently that's consistent with overflowing_mul which also returns the low part of the result as the .0 part of the tuple.

But I wonder if this really wants to be ternary instead, like

fn mul_with_carry(self, multiplier: Self, carry: Self) -> (Self, Self)

Despite what the drawbacks section says, `a as u128 * b as u128` results in ideal code for widening u64 multiplication (with or without optimisations, provided overflow checks are disabled.) Very similar results are obtained if result is split into two parts afterwards.

Regarding output tuple order, I think it could be made more obvious using the method name. For example fn mul_low_high(self, multiplier: Self) -> (Self, Self) makes it more obvious that ret.0 is the low word and ret.1 is the high word, and let (high, low) = a.mul_low_high(b); sets off alarm bells.

To me this operation falls in the same category as leading/trailing zeros, rotates, bit reverse, saturation, etc. in that it

is relatively niche, but important for some crucial routines and sometimes provided by hardware
is not provided directly by C and friends, and thus generally implemented with intrinsics or in terms of other operators (e.g., rotate ~= bit shifts and bitwise or) that then have to be pattern matched against to recognize the operation
(on some targets, at least) requires special casing by instruction selection to achieve good codegen, so the aforementioned pattern matching is important

For these operations I'm in favor of providing them in the Rust standard library, both to offer a canonical way to write out these important primitives and to make it easier for the compiler to generate good code reliably (i.e., no need to pattern match a particular way to implement the operation).

kennytm, wesleywiser, scottmcm, pyfisch, TheIronBorn, ActuallyaDeviloper, Tom-Phinney, workingjubilee, farteryhr, bluebear94, and afetisov reacted with thumbs up emoji golddranks and TheIronBorn reacted with heart emoji

@joshtriplett the Mul impl is merely an example of use; the actual method is fn(usize, usize) -> (usize, usize).

@nagisa yes, it does for me too, but i as a crate author worry whether it will remain so on other targets — will it generate such code on a 32-bit machine? a 128-bit machine? is Rust willing to preclude ever supporting 128-bit targets (e.g. RV128I)? I would prefer to have a method which means efficient double-wide multiplication rather than rely on the compiler to figure out what i mean and myself to not mess it up every time.

@rkruppe thanks for verbalizing these points — yes, the notion is to have a canonical form so neither the compiler nor a future reader needs to guess what is going on.

widening_mul LGTM, will modify RFC when we reach a consensus.

Personally I think that the return type should be the next size up of an integer type. I wouldn't mind creating u256 and i256 types that only allow bit operations and casting, but we can also just not define this for 128-bit integers.

@clarcharr I'd love to have u256 (and maybe even u512) available, but not with just a limited set of operations. But that aside, I think we can universally define these operations as a function from T, T to (T, T), even for the largest type we have. Having versions that turn into the next larger type, for every type other than the largest, seems useful as well, but I'd hope that the LLVM optimizer makes it unnecessary to have two different versions.

As the order of the high and low half of the result tuple is not obvious suggests to me that a tuple is probably not the best return type. The obvious solution would be a struct with two fields named "high" and "low", but I'm wondering if simply returning a wider integer type is not simpler (though that means i128/u128 will unfortunately lack the method).

I.e. add the following functions (and the equivalent for unsigned integers) to the standard library.

impl i8 {
    pub fn widening_mul(self, rhs: Self) -> i16 { ... }
}
impl i16 {
    pub fn widening_mul(self, rhs: Self) -> i32 { ... }
}
impl i32 {
    pub fn widening_mul(self, rhs: Self) -> i64 { ... }
}
impl i64 {
    pub fn widening_mul(self, rhs: Self) -> i128 { ... }
}
impl isize {
    #[cfg(target_pointer_width = "16")]
    pub fn widening_mul(self, rhs: Self) -> i32 { ... }
    #[cfg(target_pointer_width = "32")]
    pub fn widening_mul(self, rhs: Self) -> i64 { ... }
    #[cfg(target_pointer_width = "64")]
    pub fn widening_mul(self, rhs: Self) -> i128 { ... }
}

@JonSeverinsson Not sure there is much of a point in doing that because you can simply cast to the wider type and perform the multiplication.

Returning twice as wide type instead of a tuple has the issue that it wouldn't work on usize, which is the canonical use case for this kind of operation (at least until someone adds ireg/ureg).

I'd be fine with returning either a tuple or a struct. As @le-jzr noted, returning twice-as-wide type is impossible for usize (unless we define u2size in core, which i consider far out of scope of this RFC).

Does it make sense to define widening_mul on signed integers?

fn widening_mul(self, other: isize) -> (isize, usize);

Double-wide multiplication is a prerequisite of arbitrary-precision multiplication

I took a stab at the simplest version of that using the proposed method, and I think it ends up being something like this:

fn scalar_mul_assign(big: &mut Vec, scalar: u8) { let mut carry = 0; for i in big.iter_mut() { let (low, high) = i.wide_mul(scalar); let (low, overflowed) = low.overflowing_add(carry); *i = low; carry = high + if overflowed { 1 } else { 0 }; } if carry > 0 { big.push(carry); } }

That's not great, and I doubt LLVM will do a great job with it.

So I think it'd be good to have the carry built-in, so it can be something like

pub fn scalar_mul_assign2(big: &mut Vec, scalar: u8) { let mut carry = 0; for i in big.iter_mut() { let (x, c) = i.mul_with_carry(scalar, carry); *i = x; carry = c; } if carry > 0 { big.push(carry); } }

Since one can always pass 0 as the carry, which LLVM will fold away easily.

+1, would make writing certain cryptographic code very simple

One interesting use case for this might be a more efficient version of the multiply_and_divide function that num-integer uses internally:

/// Calculate r * a / b, avoiding overflows and fractions.
///
/// Assumes that b divides r * a evenly.
fn multiply_and_divide<T: Integer + Clone>(r: T, a: T, b: T) -> T {
    // See http://blog.plover.com/math/choose-2.html for the idea.
    let g = gcd(r.clone(), b.clone());
    r/g.clone() * (a / (b/g))
}

I mean, num-integer specifically almost certainly won't use this as it's not backwards compatible, but it is a nice use case.

ping

have we reached consensus on the name widening_mul?

The rand library uses wmul

@sm-Fifteen It is no coincidence I wrote that function as a non-allocating bigint multiplication helper when I ported APFloat from C++ and #2417 (comment) - so don't count my "vote" for widening_mul twice, heh.

strake changed the title~~carrying_mul~~ widening_mul

Dec 5, 2018

Consensus for name seems to be widening_mul; modified RFC.

What's the rationale for having u2size at all? Given that usize is supposed to be used for pointers, offsets and memory sizes, if multiplication involving it overflows, it means the result represents an amount of memory you cannot possibly allocate or refer to. You shouldn't even care what the value is: all that really matters is that it's too large to handle.

@fstirlitz One use case would be intermediate results that may overflow, e.g. rationals. While the end result of x*y/z might fit in a usize, the product x*y may not.

@fstirlitz I had 2 motivating use cases for u2size:

Intermediate values in arbitrary-precision multiplication
The which-trees-present value in smoothsort, which is word_size/log₂((1+√5)/2) bits

Would be good to have the variants that return the wider type directly, since it has direct uses for a number of common dsp operations.

@lu-zero u2size is not in core/std; i merely cite it as an example use of this operation. For fixed-width types, you can merely cast to the wider type before multiplying, and the emitted code will use the machine instruction for widening multiplication.

I think it's sad that this petered out. This is a small, useful API addition which apparently isn't too controversial or prone to vigorous bikeshedding (except the name, but that's now resolved). Maybe we should just skip ahead to implementing it? Evidently the RFC process is not working well for keeping this RFC PR moving.

Maybe we should just skip ahead to implementing it?

That's usually a good idea for small libs features.

So I found myself needing this again today, and it looks like the overall decision was to just implement this? I would be willing to help do that, but I honestly have no idea how -- I'm not sure what LLVM intrinsics are used to do wide-multiplication, and I couldn't find the docs for it. If we at least add some intrinsics for this it would make implementing a working solution easier, and we can figure out the exact API later.

@clarfonthey there's no intrinsic for it; it represents it with n[su]w multiplication on the wider integer type.

Demo, with beautiful assembly: https://rust.godbolt.org/z/86of1z184

(I do think that people writing Rust shouldn't need to know that, though, and I could imagine the best way to implement it in cranelift could be different, which would be another reason to have an exposed specialized method for it.)

m-ou-se added a commit to m-ou-se/rust that referenced this pull request

Aug 31, 2021

…ou-se

Add carrying_add, borrowing_sub, widening_mul, carrying_mul methods to integers

This comes in part from my own attempts to make (crude) big integer implementations, and also due to the stalled discussion in RFC 2417. My understanding is that changes like these are best offered directly as code and then an RFC can be opened if there needs to be more discussion before stabilisation. Since all of these methods are unstable from the start, I figured I might as well offer them now.

I tried looking into intrinsics, messed around with a few different implementations, and ultimately concluded that these are "good enough" implementations for now to at least put up some code and maybe start bikeshedding on a proper API for these.

For the carrying_add and borrowing_sub, I tried looking into potential architecture-specific code and realised that even using the LLVM intrinsics for addcarry and subborrow on x86 specifically, I was getting exactly the same assembly as the naive implementation using overflowing_add and overflowing_sub, although the LLVM IR did differ because of the architecture-specific code. Longer-term I think that they would be best suited to specific intrinsics as that would make optimisations easier (instructions like add-carry tend to use implicit flags, and thus can only be optimised if they're done one-after-another, and thus it would make the most sense to have compact intrinsics that can be merged together easily).

For widening_mul and carrying_mul, for now at least, I simply cast to the larger type and perform arithmetic that way, since we currently have no intrinsic that would work better for 128-bit integers. In the future, I also think that some form of intrinsic would work best to cover that case, but for now at least, I think that they're "good enough" for now.

The main reasoning for offering these directly to the standard library even though they're relatively niche optimisations is to help ensure that the code generated for them is optimal. Plus, these operations alone aren't enough to create big integer implementations, although they could help simplify the code required to do so and make it a bit more accessible for the average implementor.

That said, I 100% understand if any or all of these methods are not desired simply because of how niche they are. Up to you. 🤷🏻

Labels

A-arithmetic

Arithmetic related proposals & ideas

A-inherent-impl

Proposals related to inherent implementations

A-primitive

Primitive types related proposals & ideas

Libs-Tracked

Libs issues that are tracked on the team's project board.

T-libs-api

Relevant to the library API team, which will review and decide on the RFC.