STL (original) (raw)

I noticed that we had a single remaining usage of separately compiled multiprecision arithmetic machinery. I believe that we can replace this with the modern _Unsigned128 type, which is far better tested and more widely exercised. As @AlexGuteniev observed, _Unsigned128 uses intrinsics, so it should also be faster than multprec.cpp. And as a final bonus, it makes this codepath strongly resemble the 32-bit and 64-bit codepaths above.

👨‍🔬 Proof that 128 bits are sufficient

The function comment says:

// Choose intermediate type:
// To use type T for the intermediate calculation, we must show
// _Ax * (_Mx - 1) + _Cx <= numeric_limits::max()
// Split _Cx:
// _Cx <= numeric_limits::max()
// && _Ax * (_Mx - 1) <= numeric_limits::max() - _Cx
// Divide by _Ax:
// _Cx <= numeric_limits::max()
// && (_Mx - 1) <= (numeric_limits::max() - _Cx) / _Ax

The first part, _Cx <= (2^128 - 1), is obviously true.

The next part is (_Mx - 1) <= ((2^128 - 1) - _Cx) / _Ax. This is the hardest to achieve for the largest LHS (caused by the largest _Mx) and the smallest RHS (caused by the largest _Ax and the largest _Cx). So ((2^64 - 1) - 1) <= ((2^128 - 1) - (2^64 - 1)) / (2^64 - 1) is the worst case.

C:\Temp>py
Python 3.13.3 (tags/v3.13.3:6280bb5, Apr  8 2025, 14:47:33) [MSC v.1943 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ((2**64 - 1) - 1)
18446744073709551614
>>> ((2**128 - 1) - (2**64 - 1)) // (2**64 - 1)
18446744073709551616
>>> ((2**64 - 1) - 1) <= ((2**128 - 1) - (2**64 - 1)) // (2**64 - 1)
True

⚙️ Commits

_Uint _Cx always satisfies _Cx <= ULLONG_MAX.
Use _Unsigned128, preserve multprec.cpp for bincompat.
- We can drop _INLINE_VAR from _MP_len, giving it internal linkage within multprec.cpp.
- We don't need extern "C++" on the declarations - that was for modules.
- We need to add _CRTIMP2_PURE to the definitions. I thought we had harmonized all declarations and definitions, but we missed these.
Reduce multprec.cpp dependencies to slightly improve throughput.
- We can hardcode 64 instead of including <limits> for numeric_limits<unsigned long long>::digits.
- uint64_t and unsigned long long are synonyms. In fact, this file was already using unsigned long long for the definitions, even though the declarations originally said uint64_t.
- We moved the declarations out of <random> so we don't need it anymore. All we need is <yvals.h>.

✅ Notes

I verified (with static_assert(false)) that this codepath is exercised by our test suites.
I verified (with dumpbin /exports) that this doesn't affect the dllexport surface.

<random>: Use _Unsigned128 for linear_congruential_engine by StephanTLavavej · Pull Request #5436 · microsoft/STL (original) (raw)

👨‍🔬 Proof that 128 bits are sufficient

⚙️ Commits

✅ Notes

`<random>`: Use `_Unsigned128` for `linear_congruential_engine` by StephanTLavavej · Pull Request #5436 · microsoft/STL (original) (raw)