msg277371 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2016-09-25 12:38 |
------------ current state ------------ In Objects/longobject.c, the function maybe_small_long first checks whether v (the received PyLongObject pointer) is not NULL. However, currently in every call to maybe_small_long, it is already guaranteed that v is not NULL, which makes that check redundant. (Currently, the following are the only functions that call maybe_small_long: * PyLong_FromString * long_divrem * long_rshift * long_lshift * long_bitwise) With regard to relevant changes made in the past, maybe_small_long remained quite the same since it was added, in changeset 48567 (https://hg.python.org/cpython/rev/1ce7e5c5a761) - in particular, the check (whether v is not NULL) was always there. When it was added, both long_rshift and long_lshift might have called maybe_small_long with v as NULL, which seems like the reason for adding the check back then. ------------ proposed changes ------------ In Objects/longobject.c in maybe_small_long, remove the check whether v is not NULL, and add an 'assert(v != NULL);'. ------------ diff ------------ The proposed patches diff file is attached. ------------ tests ------------ I ran 'python_d.exe -m test -j3' (on my 64-bit Windows 10) with and without the patches, and got quite the same output. The outputs of both runs are attached. |
|
|
msg277408 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2016-09-26 08:12 |
Thanks for the patch. I'm a bit reluctant to make changes like this unless there's a measurable performance benefit. The extra check in maybe_small_long does have a value in terms of readability and safety with respect to future maintenance, and I'd expect a competent compiler to inline maybe_small_long and elide the extra check (though I haven't looked at the assembly output to check that). So I think we shouldn't change this unless there's a clear benefit to doing so. |
|
|
msg277409 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2016-09-26 10:19 |
I looked at the assembly, and saw maybe_small_long is not inlined, but I agree that other compilers (or future compilers) might introduce some optimization that would elide the check. Also, as expected (thanks to branch prediction, I guess), a little microbenchmark shows my patch doesn't improve performance: without my patch: python.exe -m perf timeit "122 >> 2" .................... Median +- std dev: 20.5 ns +- 0.5 ns python.exe -m perf timeit "215 << 1" .................... Median +- std dev: 20.6 ns +- 0.5 ns with my patch: python.exe -m perf timeit "122 >> 2" .................... Median +- std dev: 20.6 ns +- 0.3 ns python.exe -m perf timeit "215 << 1" .................... Median +- std dev: 20.6 ns +- 0.4 ns |
|
|
msg277417 - (view) |
Author: Antti Haapala (ztane) * |
Date: 2016-09-26 13:23 |
Totally inlined when built with GCC, release mode, there is no trace of the maybe_small_long symbol, I couldn't even really decipher where it was being done in the disassembly of longobject.c. Was this the Windows release build? |
|
|
msg277418 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2016-09-26 13:31 |
my build (release): Python 3.7.0a0 (default:da2c96cf2ce6, Sep 26 2016, 13:08:47) [MSC v.1900 32 bit (Intel)] on win32 ISTM we can close this issue. |
|
|
msg288783 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2017-03-02 07:57 |
ping (just to close the issue, I think) |
|
|
msg288784 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-03-02 08:15 |
Try to bench list(range(256)). It should create small ints in tight loop. |
|
|
msg288789 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-03-02 09:58 |
issue28272_ver1.diff LGTM. Mark Dickinson: "The extra check in maybe_small_long does have a value in (...) safety with respect to future maintenance," "assert(v != NULL);" has the same purpose. I think that it's ok to require that maybe_small_long() isn't call with NULL. The function is called 5 times and is never called with NULL. Note: long_normalize() cannot fail, it modifies the object in-place and returns the input object. Mark Dickinson: "I'm a bit reluctant to make changes like this unless there's a measurable performance benefit." I'm unable to see any impact on performance. Following microbenchmark is not significant: --- haypo@smithers$ ./python -m perf timeit -s 'x=1' 'x<<1>>1<<1>>1<<1>>1<<1>>1<<1>>1<<1>>1<<1>>1<<1>>1' --duplicate=100 --compare-to ../master-ref/python --python-names=ref:patch ref: ..................... 531 ns +- 10 ns patch: ..................... 531 ns +- 2 ns Median +- std dev: [ref] 531 ns +- 10 ns -> [patch] 531 ns +- 2 ns: 1.00x faster (-0%) --- Note: python.exe -m perf timeit "122 >> 2" doesn't test long_rshirt() since "122 >> 2" is computed by the Python compiler, not at runtime ;-) |
|
|
msg288790 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-03-02 10:00 |
> Try to bench list(range(256)). It should create small ints in tight loop. This expression doesn't call maybe_small_long(). |
|
|
msg288865 - (view) |
Author: Oren Milman (Oren Milman) * |
Date: 2017-03-03 07:14 |
So, should I open a pull request? (as some time had passed, I would also run again the tests, etc.) |
|
|
msg288875 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-03-03 10:26 |
> So, should I open a pull request? I consider that Mark Dickinson is the maintained of longobject.c, so please for his feedback before going further. It seems like Mark consider that the change is worthless and bad for maintainability/sustainability. |
|
|
msg288878 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2017-03-03 11:19 |
Without a demonstrable performance improvement, I'm opposed to making this change. Closing. |
|
|