Issue 33144: random._randbelow optimization - Python tracker (original) (raw)

Created on 2018-03-26 15:00 by wolma, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
randbelow.patch	wolma,2018-03-26 15:00

Pull Requests
URL	Status	Linked	Edit
PR 6291	merged	wolma,2018-03-28 14:49
PR 6563	merged	serhiy.storchaka,2018-04-21 14:35

Messages (16)
msg314455 - (view)	Author: Wolfgang Maier (wolma) *	Date: 2018-03-26 15:00
Given that the random module goes a long way to ensure optimal performance, I was wondering why the check for a match between the random and getrandbits methods is performed per call of Random._randbelow, when it could also be done at instantiation time (the attached patch uses __init_subclass__ for that purpose and, in my hands, gives 10-25% speedups for calls to methods relying on _randbelow). Is it really necessary to guard against someone monkey patching the methods rather than using inheritance?
msg314489 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-03-27 01:25
> it could also be done at instantiation time (the attached patch > uses __init_subclass__ for that purpose FWIW, a 10-25% speedup is only possible because the remaining code is already somewhat fast. All that is being proposed is removing couple of lines that elsewhere would be considered somewhat thin: random = self.random if type(random) is BuiltinMethod \ or type(getrandbits) is Method: Overall, the idea of doing the check only once at instantiation time seems promising. That said, I have unspecific general worries about using __init_subclass__ and patching the subclass. Perhaps Serhiy, Tim, or Mark will have thoughts on whether this sort of self-patching is something we want to be doing in the standard library, whether it would benefit PyPy, and whether it has risks to existing code, to debugging and testing, and to future maintenance. If I were the one to go the route of making a single pre-check, my instinct would be to just set a flag in __init__, so that the above code would simplify to: if self._valid_getrandbits: ...
msg314494 - (view)	Author: Tim Peters (tim.peters) *	Date: 2018-03-27 03:10
I don't see anything objectionable about the class optimizing the implementation of a private method. I'll note that there's a speed benefit beyond just removing the two type checks in the common case: the optimized `_randbelow()` also avoids populating its locals with 5 unused formal arguments (which are just "a trick" to change what would otherwise have been global accesses into local accesses). So it actually makes the method implementation cleaner & clearer too. But it's really the speed that matters here.
msg314496 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-03-27 03:30
> the optimized `_randbelow()` also avoids populating its locals > with 5 unused formal arguments Yes, that clean-up would be nice as well :-) Any thoughts on having __init__ set a flag versus using __init__subclass__ to backpatch the subclass? To me, the former looks like plain python and latter doesn't seem like something that would normally be done in the standard library.
msg314498 - (view)	Author: Tim Peters (tim.peters) *	Date: 2018-03-27 03:41
I'm the wrong guy to ask about that. Since I worked at Zope Corp, my natural inclination is to monkey-patch everything - but knowing full well that will offend everyone else ;-) That said, this optimization seems straightforward to me: two distinct method implementations for two very different approaches that have nothing in common besides the method name & signature.
msg314502 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-03-27 07:04
I think this is excellent application of __init_subclass__. It is common to patch an instance method in __init__, but this can create a reference loop if patch it by other instance method. In this case the choice doesn't depend on arguments of __init__, and can be done at class creation time. I like the idea in general, but have comments about the implementation. __init_subclass__ should take **kwargs and pass it to super().__init_subclass__(). type(cls.random) is not the same as type(self.random). I would use the condition `cls.random is _random.Random.random` instead, or check if the method is in cls.__dict__. This will break the case when random or getrandbits methods are patched after class creation or per instance, but I think we have no need to support this. We could support also the following cases: 1. class Rand1(Random): def random(self): ... # _randbelow should use random() class Rand2(Rand1): def getrandbits(self): ... # _randbelow should use getrandbits() # this is broken in the current patch 2. class Rand1(Random): def getrandbits(self): ... # _randbelow should use getrandbits() class Rand2(Rand1): def random(self): ... # _randbelow should use random() # this is broken in the current code
msg314534 - (view)	Author: Wolfgang Maier (wolma) *	Date: 2018-03-27 15:34
Serhiy: > I like the idea in general, but have comments about the implementation. > > __init_subclass__ should take **kwargs and pass it to super().__init_subclass__(). type(cls.random) is not the same as type(self.random). I would use the condition `cls.random is _random.Random.random` instead, or check if the method is in cls.__dict__. > > This will break the case when random or getrandbits methods are patched after class creation or per instance, but I think we have no need to support this. > My bad, sorry, and thanks for catching all these issues! You're absolutely right about the class type checks not being equivalent to the original ones at the instance level. Actually, this is due to the fact that I first moved the checks out of _randbelow and into __init__ just as Raymond would have done and tested this, but then I realized that __init_subclass__ looked just like the right place and moved them again - this time without testing on derived classes again. From a quick experiment it looks like types.MethodDescriptorType would be the correct type to check cls.random against and types.FunctionType would need to be checked against cls.getrandbits, but that starts to look rather esoteric to me - so you are probably right that something with a cls.__dict__ check or the alternative suggestion of `cls.random is _random.Random.random` are better solutions, indeed. > We could support also the following cases: > > 1. > class Rand1(Random): > def random(self): ... > # _randbelow should use random() > > class Rand2(Rand1): > def getrandbits(self): ... > # _randbelow should use getrandbits() > # this is broken in the current patch > Right, hadn't thought of this situation. > 2. > class Rand1(Random): > def getrandbits(self): ... > # _randbelow should use getrandbits() > > class Rand2(Rand1): > def random(self): ... > # _randbelow should use random() > # this is broken in the current code > May be worth fixing, too.
msg314536 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-03-27 15:41
Wolfgang, can you submit this as a PR.
msg314537 - (view)	Author: Wolfgang Maier (wolma) *	Date: 2018-03-27 15:46
Thanks, Raymond. I'll do that once I've addressed Serhiy's points.
msg314601 - (view)	Author: Wolfgang Maier (wolma) *	Date: 2018-03-28 14:57
So, the PR implements the behaviour suggested by Serhiy as his cases 1 and 2. Case 2 changes existing behaviour because before it was sufficient to have a user-defined getrandbits anywhere in the inheritance tree, while with the PR it has to be more recent (or be defined within the same class) as the random method. I'm not 100% sold on this particular aspect so if you think the old behaviour is better, then that's fine with me. In most real situations it would not make a difference anyway (or do people build complex inheritance hierarchies on top of random.Random?).
msg314602 - (view)	Author: Wolfgang Maier (wolma) *	Date: 2018-03-28 15:01
In addition, I took the opportunity to fix a bug in the original _randbelow in that it would only raise the advertised ValueError on n=0 in the getrandbits-dependent branch, but ZeroDivisionError in the pure random branch.
msg314788 - (view)	Author: Wolfgang Maier (wolma) *	Date: 2018-04-01 20:16
ok, I've created issue 33203 to deal with raising ValueError in _randbelow consistently.
msg315397 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-04-17 15:16
New changeset ba3a87aca37cec5b1ee32cf68f4a254fa0bb2bec by Raymond Hettinger (Wolfgang Maier) in branch 'master': bpo-33144: random.Random and subclasses: split _randbelow implementation (GH-6291) https://github.com/python/cpython/commit/ba3a87aca37cec5b1ee32cf68f4a254fa0bb2bec
msg315398 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-04-17 15:19
Possibly, the switch from type checks to identity checks could be considered a bugfix that could be backported. I've always had a lingering worry about that part of the code.
msg315570 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-04-21 14:45
PR 6291 didn't work properly with case 1. Rand2 uses getrandbits() since it is overridden in the parent despites the fact that random() is defined later. PR 6563 fixes this. It walks classes in method resolution order and finds the first class that defines random() or getrandbits(). PR 6563 also makes tests not using logging for testing purpose.
msg316286 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2018-05-08 12:45
New changeset ec1622d56c80d15740f7f8459c9a79fd55f5d3c7 by Serhiy Storchaka in branch 'master': bpo-33144: Fix choosing random.Random._randbelow implementation. (GH-6563) https://github.com/python/cpython/commit/ec1622d56c80d15740f7f8459c9a79fd55f5d3c7

History
Date	User	Action	Args
2022-04-11 14:58:59	admin	set	github: 77325
2018-05-08 12:47:14	serhiy.storchaka	set	status: open -> closedresolution: fixedstage: patch review -> resolved
2018-05-08 12:45:19	serhiy.storchaka	set	messages: +
2018-04-21 14:48:34	rhettinger	set	assignee: rhettinger -> serhiy.storchaka
2018-04-21 14:45:25	serhiy.storchaka	set	status: closed -> openresolution: fixed -> (no value)messages: + stage: resolved -> patch review
2018-04-21 14:35:54	serhiy.storchaka	set	pull_requests: + <pull%5Frequest6258>
2018-04-20 20:25:14	rhettinger	set	status: open -> closedresolution: fixedstage: patch review -> resolved
2018-04-17 15:19:35	rhettinger	set	messages: +
2018-04-17 15:16:20	rhettinger	set	messages: +
2018-04-01 20:16:38	wolma	set	messages: +
2018-03-28 15:01:18	wolma	set	messages: +
2018-03-28 14:57:43	wolma	set	messages: +
2018-03-28 14:49:06	wolma	set	stage: patch reviewpull_requests: + <pull%5Frequest6015>
2018-03-27 15:46:52	wolma	set	messages: +
2018-03-27 15:41:21	rhettinger	set	messages: +
2018-03-27 15:34:59	wolma	set	messages: +
2018-03-27 07:04:37	serhiy.storchaka	set	messages: +
2018-03-27 03:41:05	tim.peters	set	messages: +
2018-03-27 03:30:38	rhettinger	set	messages: +
2018-03-27 03:10:18	tim.peters	set	messages: +
2018-03-27 01:25:45	rhettinger	set	assignee: rhettingermessages: + nosy: + tim.peters, mark.dickinson, serhiy.storchaka
2018-03-26 15:00:28	wolma	create