[Python-Dev] pickle output not unique (original) (raw)
Kristján Valur Jónsson kristjan at ccpgames.com
Tue Aug 3 21:38:47 CEST 2010
- Previous message: [Python-Dev] release26-maint semi-frozen
- Next message: [Python-Dev] pickle output not unique
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi there. I was made aware of this oddity here: import cPickle
reffed = "xKITTENSx"[1:-1] print repr(cPickle.dumps(reffed))
print repr(cPickle.dumps("xKITTENSx"[1:-1]))
These strings are different, presumably because of the (ob_refcnt == 1) optimization used during object pickling. This might come as a suprise to many, and is potentially dangerous if pickling is being used as precursor to some hashing function. For example, we use code that caches function calls, using something akin to:
myhash = hash(cPickle.dumps(arguments)) try: cached_args, cached_value = cache[myhash] if cached_args == arguments: return cached_value except KeyError: value = Function(*args) cache[myhash] = artuments, value return value
The non-uniqueness of the pickle string will cause unnecessary cache misses in this code. Pickling is useful as a precusor because it allows for more varied object types than hash() alone would.
I just wanted to point this out. We'll attempt some local workarounds here, but it should otherwise be simple to modify pickling to optionally turn off this optimization and always generate the same output irrespective of the internal reference counts of the objects.
Cheers,
Kristján
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20100803/bb8a84d7/attachment.html>
- Previous message: [Python-Dev] release26-maint semi-frozen
- Next message: [Python-Dev] pickle output not unique
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]