Issue 28866: Type cache is not correctly invalidated on a class defining mro() (original) (raw)

Issue28866

Created on 2016-12-04 14:35 by sjpalt, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
weird.py sjpalt,2016-12-04 14:35 Steps to reproduce
weird_without_interactive.py mdk,2016-12-04 21:34
issue28866.diff mdk,2016-12-05 21:59 review
Pull Requests
URL Status Linked Edit
PR 13117 closed mdk,2019-05-06 16:07
PR 13157 merged mdk,2019-05-07 15:07
PR 13589 merged miss-islington,2019-05-26 21:52
PR 17573 open petr.viktorin,2019-12-11 12:31
Messages (22)
msg282344 - (view) Author: Steve Palt (sjpalt) Date: 2016-12-04 14:35
I ran into a strange bug while experimenting with metaclasses that implement a custom mro() method. It only seems to occur in interactive mode, either returning completely unrelated values or causing a segfault. Python 2 appears unaffected. I have been able to reproduce this with the following code: # $ python -i weird.py # >>> proxy.x # 52011448 # >>> proxy.x # 6160 # ... class Foo: pass class Meta(type): def mro(cls): return (cls, Foo, object) def __setattr__(cls, name, value): setattr(Foo, name, value) proxy = Meta('FooProxy', (), {}) proxy.x = 300 proxy.x # don't omit proxy.x = 0
msg282345 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-04 15:00
Python 2 is affected if make Foo a new style class.
msg282349 - (view) Author: Anilyka Barry (abarry) * (Python triager) Date: 2016-12-04 15:40
Additionally, trying to access an attribute before assigning it will result in an AttributeError on all subsequent accesses (even if set). I didn't manage to get a segfault, however.
msg282353 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2016-12-04 17:59
FWIW, in _PyType_Lookup I see the "300" PyLong being cached, and later, just before the segfault, I see its address getting out of the cache (a cache hit) but it's no longer a PyLong, it's scrambled, so we're getting a real pointer with scrambled values on the line: attribute = _PyType_Lookup(type, name); segfaulting two lines later in: descrgetfunc local_get = Py_TYPE(attribute)->tp_descr_get When I write scrambled value I mean: (gdb) p *attribute $21 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = -2604246222170760229, ob_type = 0xdbdbdbd\ bdbdbdbdb} To debug interactive session in GDB I used: r -i weird.py < stdin with "proxy.x" in the stdin file.
msg282368 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2016-12-04 21:34
I'm able to reproduce the segmentation fault outside the interactive mode, see attached file.
msg282375 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2016-12-04 22:51
Problem looks mainly due to the __setattr__ done on a different type than the getattr, and the cache of PyType_Lookup: - type_setattro will be called with Foo, "x", and the value to set - type_getattro will be called with FooProxy, "x" The timeline (with my weird_without_interactive.py) is: - type_setattro is called on Foo for attr "x" with the instance of Bar - type_getattro is called on FooProxy for attr "x" (the `proxy.x` line) During this type_getattro, the instance of Bar is cached in PyType_Lookup for (FooProxy, "x") - type_setattro is called on Foo for attr "x" with the value 0 During this call, insertdict will legitimately call DECREF on the Bar instance, which is deleted - type_getattro is called on FooProxy with attr "x" for the print() During this call, PyType_Lookup will cache hit for (FooProxy, x) with the old reference to the instance of Bar
msg282430 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-12-05 16:28
If you call sys._clear_type_cache() after the setattr(), weird_without_interactive.py doesn't crash anymore.
msg282468 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2016-12-05 21:59
I found a way to fix it, but as I'm just discovering cpython internals, I'm not sure it's the right way, review are very welcome. I fixed it this way because: The bytecode generated for "proxy.x = 0" (a STORE_ATTR) will call a PyObject_SetAttr with FooProxy which will soon execute the __setattr__ of Meta, itself calling a PyObject_SetAttr with Foo. The actual attribute setting is done from the PyObject_SetAttr with Foo (calling in turn type_setattro, and so on), but it's already too late to invalidate the FooProxy type: we no longer have a reference on it and can't guess that FooProxy delegated __setattr__ to Foo. So the only place when we can invalidate correctly is in the first call of PyObject_SetAttr when we know on which type the attribute will be set, and it make sense: It does not matter what a __setattr__ does and how it does it: invalidation should happen for this attribute on the current type, so invalidating here seems logic. I did not yet took the time to measure performance loss induced with this patch. With the patch: ./python -i weird.py >>> proxy.x 0 >>> proxy.x 0
msg282485 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-12-05 23:13
The issue is that the Meta class has a reference to the class Foo in its mro() method, but Foo is not aware of Meta. So when Foo is modified, the Foo cache is invalidated, but not Meta cache. .diff always invalidates the cache, so it works. But it is suboptimal, IMO it defeats the whole purpose of a cache. I never defined a mro() method. I'm not sure that it's possible to have a type cache and a mro() method? Options: * Disable completely the cache on classes defining mro() * Modify "Meta" (the C code implementing the type, not the Python code) to track the version tag of each class referenced by mro(). Problem: mro() is dynamic!? * Somehow, notify Foo that Meta has a reference to its cache, so Foo is able to invalidate Meta cache
msg282486 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-12-05 23:30
> * Somehow, notify Foo that Meta has a reference to its cache, so Foo is able to invalidate Meta cache This was implemented when the type cache was implemented in Python 2.6, but only for explicit subclasses. PyType_Modified() iterates on tp_subclasses. PyType_Ready() updates tp_subclasses: it stores a weak reference to sub classes in each base class. I understand that, if we want to implement this feature, type_mro_modified() should be modified to add a backward reference in each base class of the MRO. type_mro_modified() is called when a type is defined, but also when type.__bases__ is explicitly modified. It would require to add a new slot to types, and so increase a little bit the memory usage, and slow down the creation of a type, and type.__bases__ (slow down: probably negligible, O(1) since the existing tp_subclasses uses a dict).
msg282598 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2016-12-07 07:06
> .diff always invalidates the cache, so it works. But it is suboptimal, IMO it defeats the whole purpose of a cache. Not sure about defeating the purpose of the cache as I only invalidate in setattr, getattr are still cache hitting. I tried: $ python3 -m performance run --python=./python --rigorous -b all -o master.json $ git checkout ; make -j 8 $ python3 -m performance run --python=./python --rigorous -b all -o .json $ python3 -m performance compare master.json .json And I don't see much differences, probably only noise as I get some faster tests: ### call_method_unknown ### Median +- Std dev: 56.8 ms +- 3.3 ms -> 52.9 ms +- 1.6 ms: 1.08x faster Significant (t=12.92) ### pybench.IfThenElse ### Median +- Std dev: 247 ns +- 3 ns -> 224 ns +- 16 ns: 1.11x faster Significant (t=15.15) and some slower: ### pybench.StringPredicates ### Median +- Std dev: 1.89 us +- 0.05 us -> 2.18 us +- 0.21 us: 1.15x slower Significant (t=-15.20) ### unpack_sequence ### Median +- Std dev: 207 ns +- 4 ns -> 231 ns +- 27 ns: 1.12x slower Significant (t=-11.63) I'm not yet accustomed to this perf suite, so I may miss something obvious. I'll ran it again on master to measure the noise and should probably fine-tune my system for stability. I'll also try a benchmark without the cache for comparison.
msg288072 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2017-02-18 10:43
Hi, Tried again, this time getting some stats with MCACHE_STATS 1, to check if my patch is defeating the cache: Without my patch: $ time ./python performance/benchmarks/bm_chaos.py --worker -l1 -w0 -n1 --filename chaos.ppm --width=512 --height=512 --iterations 50000 chaos: Median: 2.51 sec -- Method cache hits = 16581735 (99%) -- Method cache true misses = 4092 (0%) -- Method cache collisions = 28542 (0%) -- Method cache size = 96 KB With my patch: $ time ./python performance/benchmarks/bm_chaos.py --worker -l1 -w0 -n1 --filename chaos.ppm --width=512 --height=512 --iterations 50000 chaos: Median: 2.53 sec -- Method cache hits = 16582260 (99%) -- Method cache true misses = 4096 (0%) -- Method cache collisions = 28012 (0%) -- Method cache size = 96 KB
msg341581 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2019-05-06 18:08
This failure appears to be a symptom of recursively traversing __bases__ rather scanning __mro__ in the implementation of type.__subclasses__ The MCACHE depends on type.__subclasses__ being correct and it is not correct for weird.py python -i weird.py >>> C = Meta("C", (), {}) >>> C.__mro__ (<class '__main__.C'>, <class '__main__.Foo'>, <class 'object'>) >>> Foo.__subclasses__() [] >>> C.__bases__ (<class 'object'>,) Fixing this may need a change in the API for type.__subclasses__() to return all subclasses, as defined by __mro__, not just the bases. A simpler, temporary fix might be to set Py_TPFLAGS_HAVE_VERSION_TAG to 0 for any class that has a custom mro()
msg343550 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2019-05-26 14:25
New changeset 180dc1b0f4a57c3f66351568ae8488fa8576d7f0 by Mark Shannon (Julien Palard) in branch 'master': bpo-28866: No type cache for types with specialized mro, invalidation is hard. (#13157) https://github.com/python/cpython/commit/180dc1b0f4a57c3f66351568ae8488fa8576d7f0
msg343581 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-26 20:56
Well done Julien 😉
msg343582 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-26 20:59
Python 2.7 and 3 7 should be fixed as well, no?
msg343583 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2019-05-26 21:50
I think I can backport it to 3.7, and I let you choose if 2.7 need this.
msg343585 - (view) Author: miss-islington (miss-islington) Date: 2019-05-26 22:14
New changeset bfd0b7720196b9ff647cc33dafbd31a04496402c by Miss Islington (bot) in branch '3.7': [3.7] bpo-28866: No type cache for types with specialized mro, invalidation is hard. (GH-13157) (GH-13589) https://github.com/python/cpython/commit/bfd0b7720196b9ff647cc33dafbd31a04496402c
msg343650 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-27 15:51
Python 2.7 is also affected according to Serhiy. Does someone want to write a backport to 2.7?
msg355377 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-10-25 19:48
#38541 reports a severe attribute access slowdown in 3.7.4 in some situations. (Fixed by the enhancement in #36922, which I presume cannot be backported.)
msg358257 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2019-12-11 11:37
A refcount problem possibly caused by the fix: https://bugs.python.org/issue39016
msg358258 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2019-12-11 12:36
Steve, I would like to add your reproducer (weird.py) to CPython's test suite, to make sure this doesn't happen again. Would you be willing to sign the contributor agreement, so we can use your code in Python? The instructions are here: https://www.python.org/psf/contrib/contrib-form/
History
Date User Action Args
2022-04-11 14:58:40 admin set github: 73052
2019-12-11 12:36:42 petr.viktorin set messages: +
2019-12-11 12:31:19 petr.viktorin set pull_requests: + <pull%5Frequest17047>
2019-12-11 11:37:01 petr.viktorin set nosy: + petr.viktorinmessages: +
2019-10-25 19:48:01 terry.reedy set nosy: + terry.reedymessages: +
2019-05-27 15:51:13 vstinner set messages: +
2019-05-26 22:14:38 miss-islington set nosy: + miss-islingtonmessages: +
2019-05-26 21:52:01 miss-islington set pull_requests: + <pull%5Frequest13496>
2019-05-26 21:50:15 mdk set messages: +
2019-05-26 20:59:23 vstinner set messages: +
2019-05-26 20:56:02 vstinner set messages: +
2019-05-26 14:25:52 Mark.Shannon set messages: +
2019-05-07 15:07:08 mdk set pull_requests: + <pull%5Frequest13074>
2019-05-06 18:08:20 Mark.Shannon set nosy: + Mark.Shannonmessages: +
2019-05-06 16:07:07 mdk set stage: patch reviewpull_requests: + <pull%5Frequest13031>
2017-02-18 10:43:24 mdk set messages: +
2016-12-07 07:06:00 mdk set messages: +
2016-12-05 23:30:05 vstinner set messages: +
2016-12-05 23:13:53 vstinner set messages: + title: Unexpected behavior resulting from mro() and __setattr__ in interactive mode -> Type cache is not correctly invalidated on a class defining mro()
2016-12-05 21:59:04 mdk set files: + issue28866.diffkeywords: + patchmessages: +
2016-12-05 16:28:04 vstinner set messages: +
2016-12-05 13:36:36 vstinner set nosy: + vstinner
2016-12-04 22:51:04 mdk set messages: +
2016-12-04 21:34:53 mdk set files: + weird_without_interactive.pymessages: +
2016-12-04 17:59:12 mdk set nosy: + mdkmessages: +
2016-12-04 15:45:33 serhiy.storchaka set versions: + Python 2.7
2016-12-04 15:40:23 abarry set nosy: + abarrymessages: +
2016-12-04 15:00:05 serhiy.storchaka set priority: normal -> highversions: + Python 3.6, Python 3.7, - Python 3.4nosy: + serhiy.storchakamessages: +
2016-12-04 14:35:45 sjpalt create