ACP: add least_significant_one
and most_significant_one
to integer types and NonZero
types · Issue #467 · rust-lang/libs-team (original) (raw)
Proposal
Problem statement
It is common in bit-twiddling code to want to isolate the most significant set bit or the least significant set bit on an integer, for a very wide variety of purposes. Currently in Rust it is quite error prone to do this, for the least significant bit you must remember the following incantation:
let lsb = x & x.wrapping_neg();
and for the most significant bit the following:
let msb = x & ((1 << (T::BITS - 1)).wrapping_shr(x.leading_zeros()));
These are just the most efficient implementations I've come up with, on x86-64 they compile to 1 and 4 branchless instructions respectively:
.lsb blsi rax, rdi
.msb lzcnt rax, rdi movabs rcx, -9223372036854775808 shrx rax, rcx, rax and rax, rdi
Most people will likely find or come up with worse implementations, or worse, buggy ones. Off-by-one errors in shift amounts, or accidental wrapping of shift amounts are common.
Motivating examples or use cases
As mentioned, these functions are widely used in all kinds of bit twiddling. Here are two concrete usage examples but these don't even begin to scratch the surface of the ways in which they could be used.
Summing elements from an array masked by a bitset:
fn masked_sum(x: &[f32; 32], mut m: u32) -> f32 { let mut s = 0.0; while m != 0 { s += x[m.trailing_zeros() as usize]; m ^= m.least_significant_one(); } s }
Doing a bitwise binary search on an array:
fn first_greater_or_equal(haystack: &[u32], needle: u32) -> usize { let mut bit = haystack.len().most_significant_one(); let mut idx = 0; while bit != 0 { if let Some(v) = haystack.get((idx | bit) - 1) { if *v < needle { idx |= bit; } } } }
Solution sketch
I propose we add the following two functions to all integer types, and all NonZeroT
types:
/// Returns the input with all bits except the least significant set bit masked out. Returns 0 if the input is 0. /// /// # Examples /// assert_eq!(u8::least_significant_one(0b01100100), 0b00000100); /// assert_eq!(u8::least_significant_one(0), 0); fn least_significant_one(self) -> Self;
/// Returns the input with all bits except the most significant set bit masked out. Returns 0 if the input is 0. /// /// # Examples /// assert_eq!(u8::most_significant_one(0b01100100), 0b01000000); /// assert_eq!(u8::most_significant_one(0), 0); fn most_significant_one(self) -> Self;
While one could potentially argue that for the normal integer types they should return Option<Self>
, this is undesirable. There are efficient implementations available that do not check for the zero input, and checking for zero adds extra branches that the caller almost surely does not want. It is always trivial to add a zero check yourself if the behavior at zero is not what you want, but in my experience zero-in zero-out is always what you want for these functions.
The NonZero
variants never have this concern and can always return Self
in any design, as there is guaranteed to be a bit set.
Alternatives
For most_significant_one
there is uN::next_power_of_two
defined on unsigned integers which almost-but-not-quite does what you want, and a proposal for prev_power_of_two
which does what you want but not for the zero input. However these methods:
- are unavailable for signed integers,
- have nasty edge cases around the maximum value or 0 (they always need to output a power of two, which is nasty with zero, and they can wrap for
next
), and - have a questionable name since they don't actually provide the next/prev power of two if the input is already a power of two, making their function ambiguous/confusing for the unaware.
The least_significant_one
and most_significant_one
are well-defined for both signed and unsigned integers on all inputs except zero, and can trivially be extended to zero without conflicting with the "must return a power of two" requirement.
Alternative names
To get some bikeshedding out of the way, here are some alternative names:
least_sig_one
/most_sig_one
lowest_one
/highest_one
lowest_bit_set
/highest_bit_set