[Numpy-discussion] NEP: Random Number Generator Policy (original) (raw)

Robert Kern robert.kern at gmail.com
Sun Jun 3 20:21:56 EDT 2018


Moving some of the Github PR comments here:

Implementation

--------------

We propose first freezing RandomState as it is and developing a new RNG subsystem alongside it. This allows anyone who has been relying on our old stream-compatibility guarantee to have plenty of time to migrate. RandomState will be considered deprecated, but with a long deprecation cycle, at least a few years.

https://github.com/numpy/numpy/pull/11229#discussion_r192604195 @bashtage writes:

RandomState could pretty easily be spun out into a stand-alone package, if useful. It is effectively a stand-alone submodule already.

Indeed. That would be a graceful forever-home for the code for anyone who needs it. However, I'd still only make that switch after at least a few years of deprecation inside numpy. And maybe a 2.0.0 release.

Any new design for the RNG subsystem will provide a choice of different core uniform PRNG algorithms. We will be more strict about a select subset of methods on these core PRNG objects. They MUST guarantee stream-compatibility for a minimal, specified set of methods which are chosen to make it easier to compose them to build other distributions. Namely,

* .bytes() * .randomuintegers()

BTW, random_uintegers() is a new method in Kevin Sheppard's randomgen, and I am referring to its semantics here. https://github.com/bashtage/randomgen/blob/master/randomgen/generator.pyx#L191

https://github.com/numpy/numpy/pull/11229#discussion_r192604275 @bashtage writes:

One of these (bytes, uintegers) seems redundant. uintegers should probably by 64 bit.

Because different core generators have different "native" outputs (MT19937, PCG32 output uint32s, PCG64 outputs uint64s, and some that I hope we never implement natively output doubles), there are some simple, but non-trivial choices to make to support each of these. I would like the core generator's author to make those choices and maintain them. They're not hard, but they are the kind of thing that ought to be decided once and consistently.

I am of the opinion that uintegers should support at least uint32 and uint64 as those are the most common native outputs among core generators. There should be a maintained way to get that native format (and yes, I'd rather have the user be explicit about it than have random_native_uint() in addition to random_uint64()).

This argument extends to .bytes(), too, now that I think about it. A stream of bytes is a native format for some generators, too, like if we decide to hook up /dev/urandom or other file-backed interface.

Hmm, what do you think about adding random_interval() to this list? And raising that up to the Python API level (a la what Python 3 did with exposing secrets.randbelow() as a primitive)? https://github.com/bashtage/randomgen/blob/master/randomgen/src/distributions/distributions.c#L1164-L1200

Many, many uses of this method would be with numbers much less than 1<<32 (e.g. Fisher-Yates shuffle), and for the 32-bit native PRNGs could mean using half as many core PRNG draws if random_interval() is implemented along with the core PRNG to make use of that fact.

The list of StableRandom methods should be chosen to support unit tests:

* .randint() * .uniform() * .normal() * .standardnormal() * .choice() * .shuffle() * .permutation()

https://github.com/numpy/numpy/pull/11229#discussion_r192604311 @bashtage writes:

standardgamma and standardexponential are important enough to be included here IMO.

"Importance" was not my criterion, only whether they are used in unit test suites. This list was just off the top of my head for methods that I think were actually used in test suites, so I'd be happy to be shown live tests that use other methods. I'd like to be a little conservative about what methods we stick in here, but we don't have to be too conservative, since we are explicitly never going to be modifying these.

-- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/cd3ee699/attachment-0001.html>



More information about the NumPy-Discussion mailing list