[Python-Dev] Instance variable access and descriptors (original) (raw)
Eyal Lotem eyal.lotem at gmail.com
Sat Jun 9 23:23:41 CEST 2007
- Previous message: [Python-Dev] zipfile and unicode filenames
- Next message: [Python-Dev] Instance variable access and descriptors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi.
I was surprised to find in my profiling that instance variable access was pretty slow.
I looked through the CPython code involved, and discovered something that really surprises me.
Python, probably through the valid assumption that most attribute lookups go to the class, tries to look for the attribute in the class first, and in the instance, second.
What Python currently does is quite peculiar! Here's a short description o PyObject_GenericGetAttr:
A. Python looks for a descriptor in the entire mro hierarchy (len(mro) class/type check and dict lookups). B. If Python found a descriptor and it has both get and set functions
- it uses it to get the value and returns, skipping the next stage. C. If Python either did not find a descriptor, or found one that has no setter, it will try a lookup in the instance dict. D. If Python failed to find it in the instance, it will use the descriptor's getter, and if it has no getter it will use the descriptor itself.
I believe the average costs of A are much higher than of C. Because there is just 1 instance dict to look through, and it is also typically smaller than the class dicts (in rare cases of worse-case timings of hash lookups), while there are len(mro) dicts to look for a descriptor in.
This means that for simple instance variable lookups, Python is paying the full mro lookup price!
I believe that this should be changed, so that Python first looks for the attribute in the instance's dict and only then through the dict's mro.
This will have the following effects:
A. It will break code that uses instance.dict['var'] directly, when 'var' exists as a property with a set in the class. I believe this is not significant. B. It will simplify getattr's semantics. Python should always give precedence to instance attributes over class ones, rather than have very weird special-cases (such as a property with a set). C. It will greatly speed up instance variable access, especially when the class has a large mro.
I think obviously the code breakage is the worst problem. This could probably be addressed by a transition version in which Python warns about any instance attributes that existed in the mro as descriptors as well.
What do you think?
- Previous message: [Python-Dev] zipfile and unicode filenames
- Next message: [Python-Dev] Instance variable access and descriptors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]