Issue 27056: pickle: constant propagation in _Unpickler_Read() (original) (raw)

Created on 2016-05-19 12:02 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
unpickle_read.patch vstinner,2016-05-19 12:02 review
unpickle_read-2.patch vstinner,2016-05-20 09:26 review
Messages (9)
msg265852 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-19 12:02
According to Linux perf, the unpickle_list benchmark (of the CPython benchmark suite) heavily depends on the performance of load() and _Unpickler_Read() functions. While running benchmarks with PGO+LTO compilation, I noticed a difference around 5% because one build uses constant propation on _Unpickler_Read() (fast), wheras another build doesn't (slow). Depending on the result of the PGO, depending on the OS and the compiler, you may or may not get this nice optimization. I propose to implement it manually to not depend on the compiler. Attached short patch implements manually the optimization. It looks to have a big impact on unpickle_list, but no impact (benchmark is not significant) on fastunpickle and pickle_list: --- $ taskset -c 3 python3 perf.py ../ref_default/rel/python ../default/rel/python -b unpickle_list,fastunpickle,pickle_list --rigorous -v (...) Report on Linux smithers 4.4.9-300.fc23.x86_64 #1 SMP Wed May 4 23:56:27 UTC 2016 x86_64 x86_64 Total CPU cores: 4 ### fastunpickle ### Avg: 0.527359 +/- 0.005932 -> 0.518548 +/- 0.00953: 1.02x faster Not significant ### pickle_list ### Avg: 0.269307 +/- 0.017465 -> 0.266015 +/- 0.00423: 1.01x faster Not significant ### unpickle_list ### Avg: 0.255805 +/- 0.006942 -> 0.231444 +/- 0.00394: 1.11x faster Significant (t=21.58) --- It would be interesting to also evaluate the computed goto optimization for the load() function. (And also try computed goto for the re/_sre module, but that's a different issue.) I tuned my system and patched perf.py (of the CPython benchmark suite) to get stable benchmark results.
msg265858 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-19 13:38
+1. Similar optimization is used in marshal. Added comments on Rietveld.
msg265920 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-20 09:26
Updated patch 2 to address Serhiy's comments.
msg265922 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-20 09:37
I forgot to note that the comment before _Unpickler_Read() becomes not correct, but you already addressed this in the second patch. unpickle_read-2.patch LGTM. Few months ago I tried to optimize reading in pickle, but didn't see significant benefit. After committing your patch I'm going to revive my patch.
msg265925 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-05-20 09:44
New changeset f9b85b47f9c8 by Victor Stinner in branch 'default': Optimize pickle.load() and pickle.loads() https://hg.python.org/cpython/rev/f9b85b47f9c8
msg265926 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-20 09:45
Serhiy Storchaka: "I forgot to note that the comment before _Unpickler_Read() becomes not correct, but you already addressed this in the second patch. unpickle_read-2.patch LGTM." Cool, pushed. "Few months ago I tried to optimize reading in pickle, but didn't see significant benefit. After committing your patch I'm going to revive my patch." Cool too, keep me in touch.
msg265946 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-20 14:07
I think that integer overflow in _Unpickler_Read() is possible. n is read from file and can be arbitrary (up to PY_SSIZE_T_MAX). This likely cause raising an exception later, but integer overflow itself causes undefined behavior, and we should avoid it.
msg265952 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-05-20 19:17
New changeset 3d7b7aa89437 by Victor Stinner in branch 'default': Issue #27056: Fix _Unpickler_Read() to avoid integer overflow https://hg.python.org/cpython/rev/3d7b7aa89437
msg265953 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-20 19:19
Serhiy Storchaka: > I think that integer overflow in _Unpickler_Read() is possible. n is read from file and can be arbitrary (up to PY_SSIZE_T_MAX). This likely cause raising an exception later, but integer overflow itself causes undefined behavior, and we should avoid it. Hum, I understood that it's ok since numbers should be signed, but in fact I'm not confident that n is always signed. You are right, it's better to use your code to avoid the integer overflow. I pushed a fix.
History
Date User Action Args
2022-04-11 14:58:31 admin set github: 71243
2016-05-20 19:19:20 vstinner set messages: +
2016-05-20 19:17:50 python-dev set messages: +
2016-05-20 14:07:49 serhiy.storchaka set messages: +
2016-05-20 09:45:57 vstinner set status: open -> closedresolution: fixedmessages: +
2016-05-20 09:44:03 python-dev set nosy: + python-devmessages: +
2016-05-20 09:37:27 serhiy.storchaka set assignee: vstinnermessages: + stage: commit review
2016-05-20 09:26:27 vstinner set files: + unpickle_read-2.patchmessages: +
2016-05-19 13:38:10 serhiy.storchaka set messages: +
2016-05-19 12:02:29 vstinner create