msg198987 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2013-10-05 14:07 |
The attached patch (which can be applied on both trunk and 2.7) gives a huge speed improvement for the case 'pow(huge_number, smallish_number, smallish_number)'. The improvement is unbounded: I get 20x with 'pow(x, y, z)' with the arguments 'x = 3 ** 10000, y = 10 ** 51 - 2, z = 10 ** 51' but increasing x just increases the factor. This is inspired by https://github.com/pyca/ed25519: check out revision 9f3e838d90ded42a86ec74c5e9f5e37dec8122a0, run it with 'time python -u signfast.py < sign.input'. This patch gives around 14% improvement. So it's a case that occurs in practice. |
|
|
msg198991 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2013-10-05 17:02 |
Good idea! The patch looks almost ready to me: the comment block before the code block should be updated, since recomputing `base` is no longer being done _just_ to force `base` to a non-negative value. |
|
|
msg198995 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2013-10-05 17:58 |
A bit of history: last time I fiddled that code, I didn't worry about this, because for large enough exponents all internal numbers _eventually_ become less than `base`. But the patch can speed up the _startup_ costs by an arbitrary amount (for smaller exponents it's _all_ "startup costs", while for larger exponents there are 31 multiplications by `base` to precompute a 5-bits-a-time table). Of course there's no problem with correctness here: `base` and `base % modulus` are equivalent in this algorithm. |
|
|
msg198996 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2013-10-05 17:59 |
Grr: should be: "all internal numbers _eventually_ become less than `modulus`", not "less than `base`". |
|
|
msg199003 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2013-10-05 21:18 |
New patch changes the comments to match the new code. |
|
|
msg199014 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-10-05 22:14 |
New changeset f34c59494420 by Tim Peters in branch '3.3': Issue #19171: speed some cases of 3-argument long pow(). http://hg.python.org/cpython/rev/f34c59494420 New changeset 6fcdd1657ee3 by Tim Peters in branch 'default': Issue #19171: speed some cases of 3-argument long pow(). http://hg.python.org/cpython/rev/6fcdd1657ee3 New changeset 101bf827611a by Tim Peters in branch '2.7': Issue #19171: speed some cases of 3-argument long pow(). http://hg.python.org/cpython/rev/101bf827611a |
|
|
msg199175 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2013-10-08 07:48 |
Hmm. I thought 2.7 (and 3.3, for that matter) was in bugfix mode only? |
|
|
msg199176 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2013-10-08 07:51 |
> I thought 2.7 (and 3.3, for that matter) was in bugfix mode only? It would be crazy to not apply this little fix-up. |
|
|
msg199177 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2013-10-08 08:13 |
> It would be crazy to not apply this little fix-up. Crazy? How so? Note that this change, while introducing a performance enhancement in some rather unlikely corner cases, also introduces a performance regression in some other unlikely corner cases: Before the patch (2.7): iwasawa:cpython mdickinson$ ./python.exe -m timeit -s "a=7**10000; b=0; c=23" "pow(a, b, c)" 1000000 loops, best of 3: 0.232 usec per loop After the patch: iwasawa:cpython mdickinson$ ./python.exe -m timeit -s "a=7**10000; b=0; c=23" "pow(a, b, c)" 100000 loops, best of 3: 13.8 usec per loop That can be easily fixed with more special-casing and more code, but I don't think this sort of experimentation is appropriate for a bugfix branch. |
|
|
msg199217 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2013-10-08 16:21 |
I'll revert the 2.7 change if people agree that's a good thing. I'm fine with it as-is. Armin pulled the idea from timing a Python public-key crypto project (see the original message in this report), where he found a 14% improvement. I don't care if the trivial exponent == 0 case slows down - that's _truly_ unlikely ;-) The time spent special-casing it would marginally slow down other cases without good reason. For any exponent other than 0, reduction by the base must be done. |
|
|
msg199230 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2013-10-08 18:30 |
No need to revert. The improvement seems like a good one; I was just a bit surprised to see it land in the maintenance branches as well as the default branch. My understanding was that minor performance improvements aren't normally candidates for inclusion in 2.7. Maybe Benjamin can clarify the policy here. > I don't care if the trivial exponent == 0 case slows down [...] Sure; I guess my point was that even the simplest change can have unexpected / unintended consequences, which is one of the reasons that it makes sense to me to avoid non-bugfix changes in 2.7 / 3.3. (We're not totally without use-cases for special-casing pow(a, 0, b), by the way: such a pow operation occurs any time you do `hash(Decimal(some_integer))`, though admittedly not with an oversized a.) |
|
|
msg199235 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2013-10-08 18:47 |
I'm glad you pointed it out, Mark! You're right about unintended consequences, and I confess I didn't think at all about the exponent == 0 case. I didn't remind myself that 2.7 was a bugfix branch either: I read Armin's "(which can be applied on both trunk and 2.7)" and reflexively checked it in. I'd _say_ I'd be more careful about that in the future, but since I still use 2.7.5 for most of my private work I wouldn't believe me either ;-) |
|
|
msg199240 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2013-10-08 19:45 |
In general, we like to touch 2.7 as little as possible. I'm not sure it's worth arguing about this (admittely small) change meets the bar. |
|
|