[Numpy-discussion] NEP: Random Number Generator Policy (original) (raw)
Robert Kern robert.kern at gmail.com
Mon Jun 4 18🔞25 EDT 2018
- Previous message (by thread): [Numpy-discussion] NEP: Random Number Generator Policy
- Next message (by thread): [Numpy-discussion] NEP: Random Number Generator Policy
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
It may be worth having a look at test suites for scipy, statsmodels, scikit-learn, etc. and estimate how much work this NEP causes those projects. If the devs of those packages are forced to do large scale migrations from RandomState to StableState, then why not instead keep RandomState and just add a new API next to it?
The problem is that we can't really have an ecosystem with two different general purpose systems. To properly use pseudorandom numbers, I need to instantiate a PRNG and thread it through all of the code in my program: both the parts that I write and the third party libraries that I don't write.
Generating test data for unit tests is separable, though. That's why I propose having a StableRandom built on the new architecture. Its purpose would be well-documented, and in my proposal is limited in features such that it will be less likely to be abused outside of that purpose. If you make it fully-featured, it is more likely to be abused by building library code around it. But even if it is so abused, because it is built on the new architecture, at least I can thread the same core PRNG state through the StableRandom distributions from the abusing library and use the better distributions class elsewhere (randomgen names it "Generator"). Just keeping RandomState around can't work like that because it doesn't have a replaceable core PRNG.
But that does suggest another alternative that we should explore:
The new architecture separates the core uniform PRNG from the wide variety
of non-uniform probability distributions. That is, the core PRNG state is
encapsulated in a discrete object that can be shared between instances of
different distribution-providing classes. numpy.random should provide two
such distribution-providing classes. The main one (let us call it
Generator
, as it is called in the prototype) will follow the new
policy: distribution methods can break the stream in feature releases.
There will also be a secondary distributions class (let us call it
LegacyGenerator
) which contains distribution methods exactly as they
exist in the current RandomState
implementation. When one combines
LegacyGenerator
with the MT19937 core PRNG, it should reproduce the
exact same stream as RandomState
for all distribution methods. The
LegacyGenerator
methods will be forever frozen.
numpy.random.RandomState()
will instantiate a LegacyGenerator
with
the MT19937 core PRNG, and whatever tricks needed to make
isinstance(prng, RandomState)
and unpickling work should be done. This
way of creating the LegacyGenerator
by way of RandomState
will be
deprecated, becoming progressively noisier over a number of release cycles,
in favor of explicitly instantiating LegacyGenerator
.
LegacyGenerator
CAN be used during this deprecation period in library
and application code until libraries and applications can migrate to the
new Generator
. Libraries and applications SHOULD migrate but MUST NOT
be forced to. LegacyGenerator
CAN be used to generate test data for
unit tests where cross-release stability of the streams is important. Test
writers SHOULD consider ways to mitigate their reliance on such stability
and SHOULD limit their usage to distribution methods that have fewer
cross-platform stability risks.
-- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/bbd47848/attachment.html>
- Previous message (by thread): [Numpy-discussion] NEP: Random Number Generator Policy
- Next message (by thread): [Numpy-discussion] NEP: Random Number Generator Policy
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]