[Numpy-discussion] NEP: Random Number Generator Policy (original) (raw)

Warren Weckesser warren.weckesser at gmail.com
Mon Jun 4 00:23:23 EDT 2018

Previous message (by thread): [Numpy-discussion] NEP: Random Number Generator Policy
Next message (by thread): [Numpy-discussion] NEP: Random Number Generator Policy
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, Jun 3, 2018 at 11:20 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:

On Sun, Jun 3, 2018 at 6:54 PM, <josef.pktd at gmail.com> wrote:

On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <robert.kern at gmail.com> wrote: On Sun, Jun 3, 2018 at 5:46 PM <josef.pktd at gmail.com> wrote:

On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.kern at gmail.com> wrote: The list of StableRandom methods should be chosen to support unit tests: * .randint() * .uniform() * .normal() * .standardnormal() * .choice() * .shuffle() * .permutation()

https://github.com/numpy/numpy/pull/11229#discussionr192604311 @bashtage writes: > standardgamma and standardexponential are important enough to be included here IMO. "Importance" was not my criterion, only whether they are used in unit test suites. This list was just off the top of my head for methods that I think were actually used in test suites, so I'd be happy to be shown live tests that use other methods. I'd like to be a little conservative about what methods we stick in here, but we don't have to be too conservative, since we are explicitly never going to be modifying these. That's one area where I thought the selection is too narrow. We should be able to get a stable stream from the uniform for some distributions. However, according to the Wikipedia description Poisson doesn't look easy. I just wrote a unit test for statsmodels using Poisson random numbers with hard coded numbers for the regression tests. I'd really rather people do this than use StableRandom; this is best practice, as I see it, if your tests involve making precise comparisons to expected results. I hardcoded the results not the random data. So the unit tests rely on a reproducible stream of Poisson random numbers. I don't want to save 500 (100 or 1000) observations in a csv file for every variation of the unit test that I run. I agree, hardcoding numbers in every place where seeded random numbers are now used is quite unrealistic. It may be worth having a look at test suites for scipy, statsmodels, scikit-learn, etc. and estimate how much work this NEP causes those projects. If the devs of those packages are forced to do large scale migrations from RandomState to StableState, then why not instead keep RandomState and just add a new API next to it?

As a quick and imperfect test, I monkey-patched numpy so that a call to numpy.random.seed(m) actually uses m+1000 as the seed. I ran the tests using the runtests.py script:

seed+1000, using 'python runtests.py -n' in the source directory:

236 failed, 12881 passed, 1248 skipped, 585 deselected, 84 xfailed, 7 xpassed

Most of the failures are in scipy.stats:

seed+1000, using 'python runtests.py -n -s stats' in the source directory:

203 failed, 1034 passed, 4 skipped, 370 deselected, 4 xfailed, 1 xpassed

Changing the amount added to the seed or running the tests using the function scipy.test("full") gives different (but similar magnitude) results:

seed+1000, using 'import scipy; scipy.test("full")' in an ipython shell:

269 failed, 13359 passed, 1271 skipped, 134 xfailed, 8 xpassed

seed+1, using 'python runtests.py -n' in the source directory:

305 failed, 12812 passed, 1248 skipped, 585 deselected, 84 xfailed, 7 xpassed

I suspect many of the tests will be easy to update, so fixing 300 or so tests does not seem like a monumental task. I haven't looked into why there are 585 deselected tests; maybe there are many more tests lurking there that will have to be updated.

Warren

Ralf

StableRandom is intended as a crutch so that the pain of moving existing unit tests away from the deprecated RandomState is less onerous. I'd really rather people write better unit tests! In particular, I do not want to add any of the integer-domain distributions (aside from shuffle/permutation/choice) as these are the ones that have the platform-dependency issues with respect to 32/64-bit long integers. They'd be unreliable for unit tests even if we kept them stable over time. I'm not sure which other distributions are common enough and not easily reproducible by transformation. E.g. negative binomial can be reproduces by a gamma-poisson mixture. On the other hand normal can be easily recreated from standardnormal. I was mostly motivated by making it a bit easier to mechanically replace uses of randn(), which is probably even more common than normal() and standardnormal() in unit tests. Would it be difficult to keep this list large, given that it should be frozen, low maintenance code ? I admit that I had in mind non-statistical unit tests. That is, tests that didn't depend on the precise distribution of the inputs. The problem is that the unit test in stats rely on precise inputs (up to some numerical noise). For example p-values themselves are uniformly distributed if the hypothesis test works correctly. That mean if I don't have control over the inputs, then my p-value could be anything in (0, 1). So either we need a real dataset, save all the random numbers in a file or have a reproducible set of random numbers. 95% of the unit tests that I write are for statistics. A large fraction of them don't rely on the exact distribution, but do rely on a random numbers that are "good enough". For example, when writing unit test, then I get every once in a while or sometimes more often a "bad" stream of random numbers, for which convergence might fail or where the estimated numbers are far away from the true numbers, so test tolerance would have to be very high. If I pick one of the seeds that looks good, then I can have tighter unit test tolerance to insure results are good in a nice case. The problem is that we cannot write robust unit tests for regression tests without stable inputs. E.g. I verified my results with a Monte Carlo with 5000 replications and 1000 Poisson observations in each. Results look close to expected and won't depend much on the exact stream of random variables. But the Monte Carlo for each variant of the test took about 40 seconds. Doing this for all option combination and dataset specification takes too long to be feasible in a unit test suite. So I rely on numpy's stable random numbers and hard code the results for a specific random sample in the regression unit tests. Josef -- Robert Kern

NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion

NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion

NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180604/9a01dbca/attachment.html>

Previous message (by thread): [Numpy-discussion] NEP: Random Number Generator Policy
Next message (by thread): [Numpy-discussion] NEP: Random Number Generator Policy
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the NumPy-Discussion mailing list