Issue 4701: range objects becomes hashable after attribute access (original) (raw)

Issue4701

Created on 2008-12-19 15:21 by hagen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
remove_casts.patch hagen,2008-12-19 16:50 remove unnecessary casts in other places
rangehash3.patch hagen,2008-12-19 18:09
Messages (20)
msg78059 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-19 15:21
As reported by Dmitry Vasiliev on python-dev, a range object suddenly becomes hash()able after an attribute access, e.g. by dir(). If I understand correctly, then the reason is that PyRange_Type doesn't set tp_hash and PyType_Ready is not normally called on the type, but only e.g. in PyObject_GenericGetAttr. I don't see any use for range objects being hashable, as they don't even have meaningful equality defined on them. So I'd recommend making them unhashable. The attached patch does this and adds a test.
msg78060 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2008-12-19 15:51
You don't need to cast PyObject_HashNotImplemented to hashfunc
msg78061 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-19 16:02
Why does every other place seem to do the cast? Historical reasons?
msg78062 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2008-12-19 16:09
On Fri, Dec 19, 2008 at 2:02 PM, Hagen Fürstenau <report@bugs.python.org> wrote: > > Hagen Fürstenau <hfuerstenau@gmx.net> added the comment: > > Why does every other place seem to do the cast? Historical reasons? > No, if you look at the functions being casted you will notice them do not take a pointer to PyObject as the first argument, if you are talking about tp_hash specifically.
msg78063 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-19 16:14
I'm talking about places like these: [hagenf@chage py3k]$ grep -R "(hashfunc)PyObject_HashNotImplemented" Objects/*.c Modules/*.c Objects/dictobject.c: (hashfunc)PyObject_HashNotImplemented, /* tp_hash */ Objects/listobject.c: (hashfunc)PyObject_HashNotImplemented, /* tp_hash */ Objects/setobject.c: (hashfunc)PyObject_HashNotImplemented, /* tp_hash */ Modules/_collectionsmodule.c: (hashfunc)PyObject_HashNotImplemented, /* tp_hash */
msg78064 - (view) Author: Guilherme Polo (gpolo) * (Python committer) Date: 2008-12-19 16:27
On Fri, Dec 19, 2008 at 2:14 PM, Hagen Fürstenau <report@bugs.python.org> wrote: > > Hagen Fürstenau <hfuerstenau@gmx.net> added the comment: > > I'm talking about places like these: > > [hagenf@chage py3k]$ grep -R "(hashfunc)PyObject_HashNotImplemented" > Objects/*.c Modules/*.c > Objects/dictobject.c: (hashfunc)PyObject_HashNotImplemented, /* > tp_hash */ > Objects/listobject.c: (hashfunc)PyObject_HashNotImplemented, /* > tp_hash */ > Objects/setobject.c: (hashfunc)PyObject_HashNotImplemented, /* > tp_hash */ > Modules/_collectionsmodule.c: (hashfunc)PyObject_HashNotImplemented, > /* tp_hash */ I have checked some of the examples you gave and noticed they were re-added (or changed) recently, but all these casts could be removed.
msg78065 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-19 16:50
Here's an updated patch without the cast and a separate patch for removing the other casts.
msg78069 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-19 21:24
The origin of the unnecessary hashfunc casts is just me following some of the more specific examples of filling in the tp_hash slot too closely without checking if the cast was still needed. I'll apply and backport Hagen's patches to 3.0 soon (as well as fixing some other non-hashable types such as slice() to use PyHash_NotImplemented), but first I want to understand why range() exhibits this behaviour, while other classes with a superficially similar implementation (such as bytearray) do not.
msg78071 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-19 22:01
OK, the discrepancy with bytearray turns out to be fairly straightforward: bytearray overrides the comparison operations, so inheritance of the default object.__hash__ is automatically blocked. range() objects don't support comparison, so they inherit __hash__ when PyType_Ready is called. Which then begs the question of why range() instances are unhashable only until something happens to invoke the tp_getattro slot on the type object... and it turns out that PyType_Ready isn't called on the type during interpreter startup. Instead, it only happens lazily when one of the operations that needs the tp_dict to be filled in calls PyType_Ready (the default dir() retrieves __dict__ from the type object, and this attribute access causes PyType_Ready to be called). Only at this point is the slot inheritance on the range() type calculated correctly.
msg78073 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-12-19 22:43
Patch looks good to me.
msg78086 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-20 06:51
It has been pointed out to me that xrange() objects are hashable in 2.x, and since these range objects are immutable, I don't believe there is any driving reason for them not to be hashable. At that point the question becomes one of why xrange() is being initialised correctly in 2.x while something is going wrong with the tp_hash slot initialisation for range() in 3.x that doesn't get fixed until PyType_Ready is called to populate tp_dict.
msg78087 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-20 06:54
Bumping to release blocker for 3.0.1 - until I understand this better, I'm not sure if the xrange->range hashing behaviour change between 2.x and 3.x is a type specific problem or a general issue in the implementation of the new approach to tp_hash inheritance for Py3k.
msg78088 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-20 07:14
Ah, I think I figured it out - in 2.x, PyObject_Hash itself includes the fallback to _PyHash_Pointer if none of tp_hash, tp_compare or the tp_richcompare slots have been implemented on the type. So long as a type is only trying to inherit object.__hash__ (as is the case with xrange), then this fallback will do the right thing if PyType_Ready hasn't been called yet. In 3.0, on the other hand, PyObject_Hash has no fallback - if tp_hash isn't filled in, the type isn't considered hashable. This means that for a type to properly inherit hashability in Py3k, PyType_Ready *must* be called on it. Probably the best thing to do is to add xrange and range to the list of types initialised in _Py_ReadyTypes in 2.x and 3.x respectively.
msg78089 - (view) Author: Hagen Fürstenau (hagen) Date: 2008-12-20 08:15
> I don't believe there is any driving reason for them not to be hashable. On the other hand, what is the use case for hashing objects whose equality is defined as object identity? Python 3.0 (r30:67503, Dec 4 2008, 06:47:38) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> range(10).__hash__ <method-wrapper '__hash__' of range object at 0x7f2d61dbd210> >>> {range(10), range(10)} {range(0, 10), range(0, 10)} I can see only confusion arising from that.
msg78108 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-20 14:19
Hagen Fürstenau wrote: > Hagen Fürstenau <hfuerstenau@gmx.net> added the comment: > >> I don't believe there is any driving reason for them not to be hashable. > > On the other hand, what is the use case for hashing objects whose > equality is defined as object identity? Fast membership testing in sets to track which objects you have and haven't seen, mapping from objects to arbitrary metadata about those objects without having to explicitly redirect through id(), that kind of thing. Generally speaking, the idea is that objects should be hashable if their concept of "equality" cannot change over the life time of the object. Identity based equality meets that criteria, which is why objects (including range/xrange) are hashable by default.
msg78401 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-28 12:28
enumerate can be added to the list of builtin types which isn't initialised correctly, as can the callable+sentinel iterator return from the 2-argument version of iter() and the default sequence iterator returned by iter() when given a type with both __len__ and __getitem__ methods, but no __iter__ method.
msg78402 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-28 12:46
Copied from python-dev post: Perhaps the path of least resistance is to change PyObject_Hash to be yet another place where PyType_Ready will be called implicitly if it hasn't been called already? That approach would get us back to the Python 2.x status quo where calling PyType_Ready was only absolutely essential if you wanted to correctly inherit a slot from a type other than object itself.
msg78423 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2008-12-28 17:33
> Perhaps the path of least resistance is to change PyObject_Hash to be > yet another place where PyType_Ready will be called implicitly if it > hasn't been called already? I think that's the best thing to do. It would bring PyObject_Hash in line with PyObject_Format, for example. If we pick some other solution (instead of modifying PyObject_Hash), then we should find all of the lazy calls to PyType_Ready (that exist to solve this problem) and remove them. IOW, it should be handled the same everywhere.
msg78496 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-30 01:39
Fixed using a lazy call to PyType_Ready in PyObject_Hash: 2.7: r68051 2.6: r68052 Forward-port to Py3k still to come.
msg78511 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008-12-30 07:41
Forward port to 3.x: 3.1: r68058 3.0: r68060
History
Date User Action Args
2022-04-11 14:56:43 admin set github: 48951
2009-01-29 11:41:40 jcea set nosy: + jcea
2008-12-30 07:41:30 ncoghlan set status: open -> closedresolution: fixedmessages: +
2008-12-30 01:39:50 ncoghlan set messages: +
2008-12-28 17:33:18 eric.smith set messages: +
2008-12-28 12:46:57 ncoghlan set messages: +
2008-12-28 12:28:28 ncoghlan set messages: +
2008-12-20 14:19:40 ncoghlan set messages: +
2008-12-20 08:15:38 hagen set messages: +
2008-12-20 07:14:38 ncoghlan set messages: +
2008-12-20 06:54:35 ncoghlan set priority: release blockerresolution: accepted -> (no value)messages: + versions: - Python 3.1
2008-12-20 06:51:27 ncoghlan set messages: +
2008-12-20 02:07:58 rhettinger set messages: -
2008-12-20 02:07:08 rhettinger set nosy: + rhettingermessages: +
2008-12-19 22:43:16 amaury.forgeotdarc set resolution: acceptedmessages: + nosy: + amaury.forgeotdarc
2008-12-19 22:01:32 ncoghlan set messages: +
2008-12-19 21:24:06 ncoghlan set assignee: ncoghlanmessages: + nosy: + ncoghlan
2008-12-19 18:09:37 hagen set files: - rangehash2.patch
2008-12-19 18:09:31 hagen set files: + rangehash3.patch
2008-12-19 18:00:12 eric.smith set nosy: + eric.smith
2008-12-19 16:50:40 hagen set files: - rangehash.patch
2008-12-19 16:50:36 hagen set files: + remove_casts.patch
2008-12-19 16:50:09 hagen set files: + rangehash2.patchmessages: +
2008-12-19 16:27:17 gpolo set messages: +
2008-12-19 16:14:09 hagen set messages: +
2008-12-19 16:09:48 gpolo set messages: +
2008-12-19 16:02:23 hagen set messages: +
2008-12-19 15:51:16 gpolo set nosy: + gpolomessages: +
2008-12-19 15:22:16 hagen set files: + rangehash.patchkeywords: + patch
2008-12-19 15:21:10 hagen create