<random>
: Use _Unsigned128
for linear_congruential_engine
by StephanTLavavej · Pull Request #5436 · microsoft/STL (original) (raw)
I noticed that we had a single remaining usage of separately compiled multiprecision arithmetic machinery. I believe that we can replace this with the modern _Unsigned128
type, which is far better tested and more widely exercised. As @AlexGuteniev observed, _Unsigned128
uses intrinsics, so it should also be faster than multprec.cpp. And as a final bonus, it makes this codepath strongly resemble the 32-bit and 64-bit codepaths above.
👨🔬 Proof that 128 bits are sufficient
The function comment says:
// Choose intermediate type: |
---|
// To use type T for the intermediate calculation, we must show |
// _Ax * (_Mx - 1) + _Cx <= numeric_limits::max() |
// Split _Cx: |
// _Cx <= numeric_limits::max() |
// && _Ax * (_Mx - 1) <= numeric_limits::max() - _Cx |
// Divide by _Ax: |
// _Cx <= numeric_limits::max() |
// && (_Mx - 1) <= (numeric_limits::max() - _Cx) / _Ax |
The first part, _Cx <= (2^128 - 1), is obviously true.
The next part is (_Mx - 1) <= ((2^128 - 1) - _Cx) / _Ax. This is the hardest to achieve for the largest LHS (caused by the largest _Mx) and the smallest RHS (caused by the largest _Ax and the largest _Cx). So ((2^64 - 1) - 1) <= ((2^128 - 1) - (2^64 - 1)) / (2^64 - 1) is the worst case.
C:\Temp>py
Python 3.13.3 (tags/v3.13.3:6280bb5, Apr 8 2025, 14:47:33) [MSC v.1943 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ((2**64 - 1) - 1)
18446744073709551614
>>> ((2**128 - 1) - (2**64 - 1)) // (2**64 - 1)
18446744073709551616
>>> ((2**64 - 1) - 1) <= ((2**128 - 1) - (2**64 - 1)) // (2**64 - 1)
True
⚙️ Commits
_Uint _Cx
always satisfies_Cx <= ULLONG_MAX
.- Use
_Unsigned128
, preserve multprec.cpp for bincompat.- We can drop
_INLINE_VAR
from_MP_len
, giving it internal linkage within multprec.cpp. - We don't need
extern "C++"
on the declarations - that was for modules. - We need to add
_CRTIMP2_PURE
to the definitions. I thought we had harmonized all declarations and definitions, but we missed these.
- We can drop
- Reduce multprec.cpp dependencies to slightly improve throughput.
- We can hardcode
64
instead of including<limits>
fornumeric_limits<unsigned long long>::digits
. uint64_t
andunsigned long long
are synonyms. In fact, this file was already usingunsigned long long
for the definitions, even though the declarations originally saiduint64_t
.- We moved the declarations out of
<random>
so we don't need it anymore. All we need is<yvals.h>
.
- We can hardcode
✅ Notes
- I verified (with
static_assert(false)
) that this codepath is exercised by our test suites. - I verified (with
dumpbin /exports
) that this doesn't affect the dllexport surface.