Issue 28813: Remove unneeded folded consts after peephole (original) (raw)
The attached patch adds new logic to peephole compiler to remove constants that are no longer needed after the main peephole pass.
For example:
def f():
var = 'te' + 'xt'
num = -12
num = -6 * 2
return (1, (3, 4), 6)
print(f.__code__.co_consts)
Without the patch:
(None, 'te', 'xt', 12, 6, 2, 1, 3, 4, 'text', -12, -6, -12, (3, 4), (1, (3, 4), 6))
With patch:
(None, 'text', -12, -12, (1, (3, 4), 6))
(unfortunately, I couldn't get rid of None because that would make 'text' a docstring)
For convenience, I've written the patch in two parts. The first one just changes the CONST_STACK_* macros to store the co_const indices instead of the constants themselves, the second one is the actual implementation of the new logic.
Aside from simply having to store less objects around, this also makes co_consts contents closer together. This may help the cache a little bit.
I did run benchmarks multiple times, but it looked like all the results were random noise. That makes sense, since I didn't directly affect the runtime. The only consistently faster benchmark is:
- regex_dna: 288 ms +- 7 ms -> 275 ms +- 5 ms: 1.05x faster
I tried to measure the difference in compile time, but it too was lost in the noise.
I also compared size of compiled .pyc files in the Lib/ directory. The gains are mostly very small.
_compat_pickle.cpython-37.pyc | 6554 -> 5851 | -10.7% sre_compile.cpython-37.pyc | 10275 -> 10025 | -2.43% hashlib.cpython-37.pyc | 6624 -> 6514 | -1.66% pstats.cpython-37.pyc | 21755 -> 21435 | -1.47% _markupbase.cpython-37.pyc | 7979 -> 7864 | -1.44% pydoc.cpython-37.pyc | 83899 -> 82712 | -1.41% _strptime.cpython-37.pyc | 15951 -> 15751 | -1.25% future.cpython-37.pyc | 4155 -> 4105 | -1.2% opcode.cpython-37.pyc | 5401 -> 5341 | -1.11% colorsys.cpython-37.pyc | 3299 -> 3263 | -1.09% signal.cpython-37.pyc | 2503 -> 2478 | -0.999% _osx_support.cpython-37.pyc | 9663 -> 9568 | -0.983% gettext.cpython-37.pyc | 13990 -> 13854 | -0.972% getpass.cpython-37.pyc | 4223 -> 4183 | -0.947% compare.cpython-37.pyc | 541 -> 536 | -0.924% warnings.cpython-37.pyc | 13328 -> 13208 | -0.9% platform.cpython-37.pyc | 27931 -> 27681 | -0.895% imaplib.cpython-37.pyc | 42019 -> 41653 | -0.871% webbrowser.cpython-37.pyc | 15836 -> 15702 | -0.846% this.cpython-37.pyc | 1253 -> 1243 | -0.798% rlcompleter.cpython-37.pyc | 5768 -> 5723 | -0.78% zipfile.cpython-37.pyc | 48024 -> 47672 | -0.733% imghdr.cpython-37.pyc | 4138 -> 4108 | -0.725% turtle.cpython-37.pyc | 131600 -> 130653 | -0.72% timeit.cpython-37.pyc | 11676 -> 11596 | -0.685% lzma.cpython-37.pyc | 11980 -> 11900 | -0.668% bz2.cpython-37.pyc | 11270 -> 11195 | -0.665% aifc.cpython-37.pyc | 25821 -> 25651 | -0.658% gzip.cpython-37.pyc | 16215 -> 16110 | -0.648% uuid.cpython-37.pyc | 20382 -> 20260 | -0.599% plistlib.cpython-37.pyc | 27354 -> 27191 | -0.596% cProfile.cpython-37.pyc | 4199 -> 4174 | -0.595% tarfile.cpython-37.pyc | 62437 -> 62076 | -0.578% sysconfig.cpython-37.pyc | 15819 -> 15728 | -0.575% profile.cpython-37.pyc | 13889 -> 13814 | -0.54% random.cpython-37.pyc | 19177 -> 19074 | -0.537% _threading_local.cpython-37.pyc | 6609 -> 6574 | -0.53% _dummy_thread.cpython-37.pyc | 4839 -> 4814 | -0.517% datetime.cpython-37.pyc | 53722 -> 53445 | -0.516% tracemalloc.cpython-37.pyc | 17218 -> 17131 | -0.505% // remaining 129 files are < 0.5% smaller, 33 of them didn't change their size
Thank you for your patch Adrian. I haven't close look, but at first glance your patch looks correct, and the idea looks great.
But moving constant folding from the peephole optimizer to the AST level (, ) would totally eliminate the need in your patch. I'll push your patch if AST optimizer will be not implemented in 3.7.
On other hand, your patch looks simple enough, and my be pushed first.
It would be easy to review if provide your changes as one patch.
FWIW, we intentionally decided not to do this when constant folding was added. The idea was to keep the peephole optimizer simple and to have it do the minimum work necessary to get its job done (optimizing the constants table takes extra time to do but doesn't result in faster code).
Another reason was that aside from contrived examples (such as the OP's example), very little real-world code gets any benefit and the benefit tends to be very small. (In other words, no one will actually notice or benefit from this patch, but their compilation times will all slow down slightly).
Lastly, the intention is to stop building out constant folding. The correct place for constant folding is upstream, using AST prior to code generation.