Issue 25856: The module attribute of non-heap classes is not interned (original) (raw)

Created on 2015-12-13 16:17 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
intern___module__.patch serhiy.storchaka,2015-12-18 16:49 review
intern_and_cache___module__.patch serhiy.storchaka,2015-12-29 08:23 review
intern_and_cache___module__2.patch serhiy.storchaka,2016-09-07 11:50 review
intern_and_cache___module__3.patch serhiy.storchaka,2016-09-07 18:19 review
Messages (15)
msg256322 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-13 16:17
One of purposes of the STACK_GLOBAL opcode introduced in pickle protocol 4 is to avoid repeating module name for different globals of the same module. >>> pickletools.dis(pickletools.optimize(pickle.dumps([sys.getsizeof, sys.intern], 4))) 0: \x80 PROTO 4 2: \x95 FRAME 33 11: ] EMPTY_LIST 12: ( MARK 13: \x8c SHORT_BINUNICODE 'sys' 18: \x94 MEMOIZE (as 0) 19: \x8c SHORT_BINUNICODE 'getsizeof' 30: \x93 STACK_GLOBAL 31: h BINGET 0 33: \x8c SHORT_BINUNICODE 'intern' 41: \x93 STACK_GLOBAL 42: e APPENDS (MARK at 12) 43: . STOP highest protocol among opcodes = 4 But this doesn't work with the itertools module. >>> pickletools.dis(pickletools.optimize(pickle.dumps([itertools.chain, itertools.accumulate], 4))) 0: \x80 PROTO 4 2: \x95 FRAME 47 11: ] EMPTY_LIST 12: ( MARK 13: \x8c SHORT_BINUNICODE 'itertools' 24: \x8c SHORT_BINUNICODE 'chain' 31: \x93 STACK_GLOBAL 32: \x8c SHORT_BINUNICODE 'itertools' 43: \x8c SHORT_BINUNICODE 'accumulate' 55: \x93 STACK_GLOBAL 56: e APPENDS (MARK at 12) 57: . STOP highest protocol among opcodes = 4 That is because the __module__ attribute of itertools members is not interned. >>> sys.getsizeof.__module__ is sys.intern.__module__ True >>> itertools.chain.__module__ is itertools.chain.__module__ False In addition to inefficient pickle this perhaps leads to small performance hit on accessing the __module__ attribute or using its value as dictionary key.
msg256694 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-18 16:49
Actually this is not specific to itertools. Every non-heap class not from the builtins module is affected. Proposed patch just interns the __module__ value. The patch also cleans up the code.
msg257171 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-29 07:43
Interning the __module__ string causes small performance hit: $ ./python -m timeit -s "from itertools import chain" -- "chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__" Unpatched: 1.93 usec per loop Patched: 4.09 usec per loop This can be avoided if cache created string in type's __dict__. Following patch makes __module__ retrieving for non-heap types as fast as for heap types: Patched2: 0.871 usec per loop
msg274799 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 11:43
Could anyone please make a review?
msg274842 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-07 17:18
I don't understand this code: type->tp_dict && PyDict_Check(type->tp_dict) since the code explicitly assume it's not NULL and access it as a dict earlier in the function
msg274857 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 18:19
Good catch Benjamin! There is yet one bug -- the type type already has the __module__ attribute, but it is a descriptor, not a string. Updated patch fixes these bugs.
msg274861 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-07 18:30
I don't understand why you need to check the validity of tp_dict at all. We generally assume it's a dict.
msg274862 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 18:36
Indeed, the PyDict_Check() check can be omitted.
msg274863 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-07 18:40
I don't think it can be NULL either. On Wed, Sep 7, 2016, at 11:36, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > Indeed, the PyDict_Check() check can be omitted. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue25856> > _______________________________________
msg274865 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 18:45
It can be NULL in very rare cases. See for example .
msg275166 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-08 21:29
I think it's better not write into the type dict in a getter. You might just use PyUnicode_InternFromString every time. FWIW, __name__ also has this behavior.
msg275175 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-08 21:39
This is what my first path does. But this slows down retrieving the __module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I haven't bother. Interning __name__ and __qualname__ is less important, because different functions and classes usually have different names.
msg275176 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-08 21:42
I'm not too worried about slowing down __module__ especially since it's not any slower for heap types or types in builtins. On Thu, Sep 8, 2016, at 14:39, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > This is what my first path does. But this slows down retrieving the > __module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I > haven't bother. > > Interning __name__ and __qualname__ is less important, because different > functions and classes usually have different names. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue25856> > _______________________________________
msg275460 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-09 21:55
New changeset 861ddad3e0c1 by Serhiy Storchaka in branch 'default': Issue #25856: The __module__ attribute of extension classes and functions https://hg.python.org/cpython/rev/861ddad3e0c1
msg275464 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-09 21:58
Thank you for helpful review Benjamin.
History
Date User Action Args
2022-04-11 14:58:24 admin set github: 70043
2016-09-09 21:58:55 serhiy.storchaka set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2016-09-09 21:55:21 python-dev set nosy: + python-devmessages: +
2016-09-08 21:42:50 benjamin.peterson set messages: +
2016-09-08 21:39:35 serhiy.storchaka set messages: +
2016-09-08 21:29:36 benjamin.peterson set messages: +
2016-09-07 18:45:24 serhiy.storchaka set messages: +
2016-09-07 18:40:12 benjamin.peterson set messages: +
2016-09-07 18:36:12 serhiy.storchaka set messages: +
2016-09-07 18:30:00 benjamin.peterson set messages: +
2016-09-07 18:19:55 serhiy.storchaka set files: + intern_and_cache___module__3.patchmessages: +
2016-09-07 17🔞43 benjamin.peterson set nosy: + benjamin.petersonmessages: +
2016-09-07 11:50:55 serhiy.storchaka set files: + intern_and_cache___module__2.patch
2016-09-07 11:43:10 serhiy.storchaka set assignee: serhiy.storchakamessages: +
2015-12-29 08:23:12 serhiy.storchaka set files: + intern_and_cache___module__.patch
2015-12-29 08:22:54 serhiy.storchaka set files: - intern_and_cache___module__.patch
2015-12-29 07:43:41 serhiy.storchaka set files: + intern_and_cache___module__.patchtype: performance -> enhancementmessages: +
2015-12-18 16:49:06 serhiy.storchaka set files: + intern___module__.patchtitle: The __module__ attribute of itertools members is not interned -> The __module__ attribute of non-heap classes is not internedmessages: + keywords: + patchstage: patch review
2015-12-13 16:17:44 serhiy.storchaka create